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PREFACE 


This  report  is  comprised  of  a  selection  from  the  papers  presented  at 
the  144th  Annual  Meeting  of  the  American  Statistical  Association  (ASA) 
in  Philadelphia,  Pennsylvania,  August  13-16,  1984.  Most  of  the  papers 
in  the  volume  deal  with  methodological  and  substantive  studies  related 
to  the  Survey  of  Income  and  Program  Participation  (SIPP)  and  the  Income 
Survey  Development  Program  (ISDP),  an  experimental  program  designed  to 
test,  procedures  used  in  conducting  SIPP. 

The  SIPP  is  a  new  Census  Bureau  survey  collecting  data  that  will  help 
measure  income  distribution  and  poverty  throughout  the  country  more 
accurately.  These  data  will  be  used  to  study  Federal  and  State  aid 
programs  (such  as  food  stamps,  Social  Security,  Supplemental  Security 
Income,  Aid  to  Families  with  Dependent  Children,  Medicaid,  Medicare, 
and  others),  to  estimate  future  program  costs  and  coverage,  and  to 
assess  the  effects  of  proposed  changes  in  program  eligibility  rules 
or  benefit  levels. 

Households  in  the  survey  will  be  interviewed  at  4-month  intervals  over 
a  period  of  2  1/2  years.  The  reference  period  will  be  the  4  months 
preceding  the  interview.  In  all,  about  20,000  households  will  be  in- 
terviewed, approximately  5,000  each  month.  Field  operations  will  be 
handled  through  the  Census  Bureau's  12  regional  offices. 

Recurring  questions  will  deal  with  employment,  types  of  income,  non- 
cash benefits,  assets,  liabilities,  and  taxes.  Periodic  questions 
will  be  added  dealing  with  school  enrollment,  marital  history,  migra- 
tion, disability,  and  other  topics.  Additional  supplemental  questions 
will  also  be  added  to  the  SIPP  questionnaire  as  the  need  arises. 

Preliminary  tabular  data  from  the  SIPP  is  distributed  as  part  of  the 
Current  Population  Report  series  P-70,  Economic  Characteristics  of 
Households  in  the  United  States.  The  first  report  covering  the  third 
quarter  of  1983  was  released  in  September  1984.  In  addition,  the 
first  in  a  series  of  microdata  files  from  the  SIPP  is  now  available. 
This  file  contains  the  results  of  interviews  conducted  between  October 

1983  and  January  1984.  Microdata  files  for  Waves  2  through  9  of  the 

1984  SIPP  Panel  will  be  released  approximately  every  4  months,  with 
Wave  2  appearing  in  February  1985. 


iii 


Contents.  As  shown  in  the  Table  of  Contents,  the  grouping  of  the 
papers~Tand  accompanying  discussion  comments)  basically  is  in  keeping 
with  the  ASA  sessions  at  which  the  presentations  were  originally 
given.  There  were  four  "Survey  of  Income  and  Program  Participation" 
sessions,  two  of  which  were  included  in  the  Social  Statistics  Section 
and  two  in  the  Survey  Research  Methods  Section  of  the  ASA  meetings. 
These  sessions  covered  a  range  of  topics,  both  methodological  and 
substantive,  about  longitudinal  surveys  and  SIPP.  Finally,  the  Social 
Statistics  Section  sponsored  a  session  on  "Case  Studies  in  Panel 
Survey  Design:  The  International  Experience."  This  session  provided 
an  opportunity  for  individuals  involved  in  the  design  and  development 
of  longitudinal  surveys  to  share  experiences  and  discuss  issues  of 
mutual  interest. 

The  contents  of  the  report  were  prepared  by  the  individual  authors  for 
publication  in  the  1984  American  Statistical  Association  Proceedings. 
For  this  reason,  the  format  conforms  basically  to  that  required  by  ASA. 
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SURVEY  OF  INCOME 

AND 

PROGRAM  PARTICIPATION: 

SESSION  I 


This  section  is  comprised  of  five  papers  presented 
in  this  session  which  was  sponsored  by  the  Section 
on  Social  Statistics. 


AN    ANALYSIS    OF    INTRA-YEAR   UNEARNED    INCOME    FLOWS    ON   THE    ISDP 
PAT   DOYLE,    MATHEMATICA    POLICY   RESEARCH,    INC. 


This   paper   presents   excerpts    from  a   study 
conducted    for   the  Food   and   Nutrition  Service, 
USDA   (Doyle,    1984).      The   overall  objective   of    the 
study  was    to   provide    the   foundation  upon  which 
improvements   could   be  made   to   the   simulation  of 
Food  Stamp  Program  costs  and   caseloads.      The 
simulations  are   currently  conducted   on  CPS-type 
files  which   lack  detail    on  intra-year   income 
receipts.      The   Food  Stamp   Program,    however,    has   a 
monthly  accounting    period,    thus    requiring    that 
the   simulation   model   first    allocate   annual   income 
to  monthly   amounts.      In   so    doing,    the   current 
food   stamp  model  relies   on  simple  assumptions 
concerning   variation  in  unearned    income.      These 
assumptions  were   made   in  the   absence    of    infor- 
mation on   intra-year  income   flows.      Essentially 
the   model  assumes    that   with   the   exception   of 
welfare  and   unemployment   compensation  benefits, 
unearned   income   is   received   evenly   throughout   the 
year.      Examination  of    data   from   the   ISDP  shows, 
as   expected,    that    for   some   sources   this  assump- 
tion is   sufficient.      However,    for  other   income 
sources,    the  observed   intra-year   income    fluc- 
tuations  are   large.      Therefore,    in  order   to 
improve   the   model   of   the  Food  Stamp   Program, 
existing   procedures    for  allocation  of   unearned 
income   should  be   changed.      This    presentation 
reports  on  the  monthly  variation   in  income    recip- 
iency and   amounts    for  unemployment   compensation, 
workmen's   compensation,    asset   income,    and   other 
unearned   income   exclusive   of   welfare. 

THE  DATA 

The   focus  of    this   research   was   guided   by    the 
information  available  on  CPS-type   files.      One 
important   source   of    information  on  these   files 
which  contributes    to   the   understanding   of   intra- 
year  patterns   of    recipiency    is   the  duration  of 
the  work   period  within  a   calendar  year.      That   is, 
part-year  workers   can  be   distinguished    from  full- 
year  workers.      Hence,    one   dimension  of    the 
descriptive  statistics   which   follow  is  work 
versus  non-work. 

Furthermore,    since   the   improvements   to  the 
simulation  do  not   extend   to   the   imputation  of 
fluctuations   in  household  composition  within  the 
year  on  the  CPS,   an  analysis  file  was   designed  to 
replicate   the   cross-sectional   nature  of    the  CPS 
files.      That   is,    individuals   in  households  as 
they   existed    in  Wave    5  were  extracted   along  with 
retrospective   longitudinal   data  on   income 
receipts  during    the  calendar  months   of    1979.      In 
order  to  target   the  research  to  the   potentially 
eligible  population,    the  universe  for  the  study 
was  further  restricted   to  low  income  households. 
Low  income  households  were  defined  as  households 
with  composition  defined  as  of    the  Wave  5  inter- 
view where  the   total  monthly  income  aggregated 
over  all  members   was   less   than  twice   the  Food 
Stamp  eligibility   limit    for  at    least   one   month 
during   calendar  year   1979. *      For  the   initial 
screen,    if  an  observation  reported  recipiency   for 
an  income   source   but   did   not    report    the  dollar 
amount  it  was  counted  as  receipt  of  a  zero 
amount.      Individuals   not    present   in  the  sample 
for  the  entire   reference   period  were   treated  as 
non-recipients   of    income   during    their  absence. 


There  were   9,383   individuals    extracted,    7,468  of 
which   were    present    in   the    ISDP   sample   all    12 
months   of    calendar  year   1979. 

THE   RESULTS 

Workmen's   Compensation.      For   thi6   source    the 
ISDP   provided   mixed    results.      There   were   extreme- 
ly   few  observations    (71    individuals    reporting 
receipt    during   the    year,    65   of   whom  were    part- 
year  workers).      Furthermore,    individual    reported 
amounts   were   observed    to   be    unreasonable   (one 
observation  reported    receipt   of    $16,666   in  each 
of   three   months).      The   question   still   remains, 
therefore,    as    to  whether  or   not    to   allocate    this 
income    source    solely    to   periods    of   non-work.      The 
ISDP  data   showed   that  more   than  half    of    the 
respondents    (53%  based  on  weighted  counts) 
reported    receipt  during    the  work   period.      Of 
those,    57%  received  workmen's   compensation   only 
during    the  work    period,    the   remainder   received    it 
during   both  work  and  non-work   periods. 

For  purposes   of   modeling    food   stamp   eligibil- 
ity,   this   income    source   was    lumped  with  the 
residual   other   income   category   discussed   below. 
As   described   there,    duration   of   receipt   was   first 
randomly    imputed   and    then  the  actual   calendar 
months    of   recipiency  were   randomly   assigned. 
This   stochastic    process  was   independent   of    the 
assignment   of  work  versus   non-work   periods. 

Unemployment   Compensation.      In   the   analysis   of 
unemployment   compensation   the   issue    of   whether  or 
not   to  allocate  recipiency  evenly  over  the  year 
is   irrelevant   because   this    source   is    directly 
related    to   labor   force  activity.      It    is   currently 
assumed   that   when   both  recipients    of   unemployment 
compensation  and  weeks   of    unemployment   are 
reported,    that   the   periods   coincide.      Therefore, 
the  key    issue    is   to  what   extent   does  UI   recipi- 
ency coincide   with  work   periods.      Table    14   dis- 
plays  recipients  of    unemployment   compensation  by 
duration   of    the  work   period  and   by   the  coin- 
cidence  of   work  and    receipt   of    this   income 
source.      Months  working  in  this   case   (as   in   all 
other  cases   in  this   report)   are  months   during 
which  earnings   are   reported.      This   table   shows 
that    17.6%  of   UI  recipients   in  poor  households 
were   full-year  workers.      An   additional  42%  were 
part-year  workers    reporting   UI   receipt  while 
working.      This   seems   unusually  high   compared   to 
other   estimates.      Burtless   (1983)    reports    that   6% 
of  UI  recipients  in  a   given  week  are   underem- 
ployed rather  than  unemployed.      It   is  useful, 
therefore,    to   examine   the   recipients   more 
closely.      Table    15  shows   the  duration  of    the 
coincidence   of  work  and  UI  receipt    for  the  2.6 
million  workers.      In  over  95%  of    the  cases   the 
periods   of   receipt   of  UI   did  not   encompass   the 
full  period  of   work.      Over  half   of   this  group 
only  received  unemployment  compensation  during 
one   of    the  months   reported  working.      An  addi- 
tional  30%  reported   receipt  only  during    two 
months   of   working.      This   suggests   that   the   coin- 
cidental  period  was   one   of    transition.      That   is, 
either  a    job  was   lost   early   in   the  month  and   UI 
benefits  are  received   later   in  the   month  or  UI 


month  of    overlap    reflect    individuals   with  only 
one    transition   period   and   the    remaining   30% 
represent    cases   with   two    transition  periods. 
Fifteen   percent    of    the   cases   had   more    than    two 
months   of    coincidental    receipt   of    earnings    and 
UI.      These   include   both  cases   where  more   than   two 
transitions   occurred   as   well   as   underemployed 
individuals. 

These   results  suggest   that  a   significant 
number   of   the  cases   reporting  earnings  and  UI 
receipt   concurrently  were  in  a   transition 
period.      This   is   further  supported   by  the 
fluctuation   in  average    benefits   displayed    in 
Table   16.      average  UI   benefits   received  during 
the  worked   period  were  $190  whereas  average 
benefits   received  during  the  nonwork   period  were 
$316.      When  disaggregated    by  whether  or  not 
individuals   received  UI   benefits   during   both 
periods    of   work   and    nonwork,    the   contrast   in 
average  benefits   across   the   two   periods   is 
increased.      Recipients   during   both  periods   report 
an  average   of   $191   in  the  work   period  and   $331   in 
the   nonwork   period.      Persons    receiving   benefits 
solely  during  non-work   periods   reported  an 
average   of    $304  and   persons    receiving   benefits 
solely  during  work   periods   reported   $189. 

Based   on  these   results   the  allocation  of   UI 
within  the  year  on  CPS-type  files  will  be   modi- 
fied.     The  process  will  be  to  first   derive 
expected  duration  of   receipt   of  UI   from  the 
reported   annual   amount   incorporating   regulations 
regarding  maximum  weekly   benefits  and  maximum 
weeks  of    receipt.        Then  the  period  of    receipt 
will  be  allocated   to  periods  of   unemployment.      If 
the  duration  of    receipt  exceeds   the  unemployment 
period,    excess  months   of   recipiency  will  be 
allocated   to  periods   of   work   immediately   sur- 
rounding  periods   of   unemployment. 

Asset  Income.      Table    17   shows    the  frequency 
counts   of   persons   in  low  income  households   by 
number  of   months   receiving    income   from  assets. 
This   category   includes   interest   income   from 
savings   accounts,    etc. ,   dividend    income   from 
stocks  and  bonds,    and   rental  income    from  pro- 
perty.     Overall,    55%  of    the  population  reporting 
asset   income   received    that   income   the  full  year. 
Of   the  remaining   persons  with  asset   income,    the 
distribution  by  number  of  months  received   is 
biased  due   to  the  way  in  which  asset   income   data 
were  gathered   in  the  ISDP.      For  many    sources 
included  here,    lump   sum  quarterly  or  semi-annual 
amounts  were   requested   by   the  interviewer  depend- 
ing on  whether  the  individual   fell  in  the   three 
month  recall  sample   or  the   six  month  recall 
sample.      Recipiency   for  these  cases   was   evenly 
allocated   over  the  relevant   period   thus   intro- 
ducing a  bias   into  the  examination  of   recipiency 
patterns  within  the  year. 

Table    17   also   shows    the  differences   in  recip- 
iency  patterns   for  elderly  and  non-elderly 
(elderly   in  this  case  refers    to   persons   age   60 
and  older).      For  the  older  population,    the 
proportion  of    individuals   receiving   asset   income 
for   12  months   is   72%,    which  is  significantly 
higher  than  the  overall  population.      Forty-nine 
percent    of   the  non-elderly  recipients  received 
asset   income   the  full  year. 

Based  on  the   fact   that  such  a  high  proportion 
of   the  asset   income   recipients   received   that 
source   the  full  year  and  that  the   ISDP  question- 


naire design   prevents  a  detailed   analysis  of 
intra-year   patterns   of    recipiency,    asset   income 
recipiency  on  the  FNS  data   base  will   be   evenly 
allocated   across    calendar   months. 

Table    18   shows    the   change    in  average   monthly 
asset   income   between  work  and  non-work   periods 
for  persons   in  low-income   households   who   received 
asset   income   during  both  periods  and    for   persons 
whose  assets  were   less   than  $3,000.      The  asset 
screen  was   imposed   to  further  isolate   individuals 
who  were  potentially  eligible    to   receive   food 
stamps.      For   the   potentially  eligible   group, 
asset   income   did   not   vary   significantly  across 
periods   of   work  and  non-work.      Potentially 
eligible   elderly   individuals    reported   $10  on 
average   during  both  periods  while   the  non-elderly 
reported   $5  during   the  nonwork   period  and   $4 
during   the  work   period. 

Based   on  the  observation  that   individuals    in 
low  income   households   with  small  amounts   of 
assets  tend  not   to  have    significant  variation 
within   the   year,   asset   income   amounts   in  the   FNS 
data   base  will  be  evenly  allocated   across 
calendar  months. 

Other  Income.      The  pattern  of    recipiency  of 
other  income  varies   significantly   for  elderly  and 
non-elderly   recipients.      Table    19   shows   that   77% 
of   the  elderly  report   receiving  other  Income 
continuously   throughout   the  year  whereas  only   24% 
of  non-elderly  received  this   income   amount   for 
the  full   12  months.      Recipiency  patterns   for  non- 
elderly   were   further   disaggregated  by   labor   force 
activity   but  then  did  not   appear  to  be  a  signifi- 
cant  difference  between  patterns   of   recipiency  of 
part-year  workers   and   the  rest  of    the  non-elderly 
population. 

Recognizing   that  this   residual   income   category 
includes   income   sources   such  as   lump  sum  payments 
which  are  received   intermittently   throughout   the 
year  as  well  as   regularly   received   sources  such 
as  pensions   and   social   security,    the  non-elderly 
recipients  were  further   examined  by   type   of   other 
income   received.      Table   21   shows   the  distribution 
of  non-elderly  recipients  in  poor  households  by- 
number  of   months  receiving   regular  and   irregular 
income.      Regular  income  consists   of   social  secur- 
ity,   railroad   retirement,   veterans   benefits,    and 
other   disability  payments   excluding  workmen's 
compensation,    and  private  and   government   pen- 
sions.     Irregular  income   Includes   all  other 
sources   not   classified   in  the  previous  sections 
or  classified  as   regular  income.      Over  one-third 
of   the  recipients   of   regular  income  received  that 
source   for   12  months  whereas   only    15  percent  of 
the   recipients   of   irregular  income  received  that 
source   for  the   full  year.      In  both  cases,    the 
majority   of   the   remaining  recipients   fall  at  the 
lower  end  of    the  distribution.      The  balance  of 
the  recipients   in  both  cases  are   probably  fairly 
evenly  distributed   but   apparent  reporting  prob- 
lems  on  the   ISDP  tend   to  bias   the  observed  dis- 
tribution.     The  apparent   reporting  problems 
result   from  respondents'   tendency  to  report 
changes  in  recipiency  status  more  often  between 
waves   than  between  consecutive   pairs   of  months 
(Moore  and  Kasprzyk,    1984). 

Table  22  shows   the   change  in  average  amounts 
received  for  other  income  across  periods  of  work 
and  non-work  separately   for  elderly  and  non- 
elderly   recipients.      Overall,   average   income 


amounts  were   20%   higher  during  work   periods.      As 
was   true   for  recipiency,    there  was  a  significant 
difference   between  elderly  and   non-elderly   income 
flows*      For   the  elderly  in  low  Income  households 
who   received   other  Income   during  work  and   non- 
work  periods,   amounts  received  while  working  were 
approximately  the  same  as  those  received  while 
not  working.     When  restricted  further   to  elderly 
individuals  with  low  assets,    more  variation 
existed  but  average  amounts  received  while  work- 
ing were  only  six   percent   less   than  amounts 
received  while  not  working.      Non-elderly   indivi- 
duals,   on  the  other  hand,    reported   average   bene- 
fits received  while  working  to  be   130  percent  of 
those   received  while   not   working.      When  restrict- 
ed  to  individuals  with  low  assets,    the   variation 
was   reduced   but   still  remained  high  with  amounts 
received  during  work  periods  20  percent  higher 
than  amounts   received  while  not   working. 

Again  this  analysis   of   the   1SDP  data  suggest 
that   for  the  elderly   an  even  allocation  of    other 
income  across   months  would  be  sufficient    for 
modeling   food  stamp  eligibility.      However,    for 
non-elderly,   this  method  would  understate  the 
variation  in  amounts  received   from  this   source. 

Based  on  the  distribution  presented  here,   the 
allocation  of   other  income   recipiency  on  the  Food 
Stamp  data  bases  will  be  a  stochastic  process 
whereby  duration  is  randomly  determined   and   then 
a  period  of  recipiency  is  stochastically  assigned 
within  the  twelve  month  reference  period.        The 
probability  upon  which  the  duration  will  be 
determined  will  be  derived   from  the  distributions 
presented  in  Tables    19  through  21.      Recognizing 
that   the  irregularity  of   the  distributions   is 
caused,   at   least   to   some  extent,   by  reporting 
patterns  on  the  ISDP,   they  will  be  smoothed  out 
in  the  construction  of  a  cumulative   probability 
function.      Also,   workmen's   compensation  will   be 
included  with  regular  income.      Amounts  received 
will  be  evenly  allocated   over  the  imputed  period 
of   recipiency. 

FOOTNOTES 

The  food  stamp  eligibility  limits  were  those 
in  effect  July  1,  1979.   These  reflect  the  0MB 
poverty  guidelines  for  mid  1978  updated  by  the 
CPI. 

As  is  true  for  other  unearned  income  sources, 
the  CPS  does  not  measure  duration  of  receipt. 

As  is  true  for  all  unearned  income  sources, 
the  CPS  does  not  measure  duration  of  receipt  of 
other  income. 


Receipt 
Period 

Recipients 

Work  Status 

Count 
(1000) 

X 

Full  Year 
Workers 

Work 
only 

778.27 

17.  bX 

Nonworkers 

Nonwork 
only 

198.31 

4.5 

Part  Year 
Workers 

Both 

Nonwork 

only 

Work 

only 

1679.63 
1595.64 
182.46 

37.9 
36.0 
4.1 

Total 

4434.31 

100.0 

SOURCE:   Prepared  by  Mathematica  Policy 

Research  using  the  ISDP/RAMIS  II 
system. 

NOTE:  This  table  is  based  on  371  obser- 
vations who  reported  unemployment 
compensation  and  who  were  present 
in  the  sample  the  full  year. 
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TABLE  17 
DURATION  OF  RECEIPT  OF  ASSET  INCOME 


Number  of  Months  Number  of 
Receiving  UI      Months 
While  Working     Worked 


Recipients 
Count  X 
(1000) 


3-6 
7-11 


126.5 

846.8 

338.4 

1,437.6 

4.8 
32.1 
12.8 
54.4 

0 
144.3 
367.4 
28  3.2 

0 
5.5 
13.9 
10.7 

794.9 

30.1 

36.1 
215.1 
156.6 

1.4 
8.1 
5.9 

406.9 

15.4 

2640.4 

100.0 

URCE:   Prepared  by  Mathematics  Policy  Research 
using  the  ISDP/RAMIS  II  system. 

TE:  This  table  is  based  on  210  observations 
who  were  interviewed  over  the  entire 
calendar  year.  There  were  an  addi- 
tional 27  observations  reporting 
receipt  of  UI  while  working  who  were 
not  interviewed  during  at  least  one 
wave. 

TABLE  16 

AVERAGE  UNEMPLOYMENT   COMPENSATION 

FOR  PART- YEAR 

WORKERS   BY  WORK  STATUS 


Period  of 
Receipt 


Work  Only 
Nonwork  only 
Both 


Prepared  by  Mathematica  Policy 
Research  using  the   ISDP/RAMIS   II 
system. 

This  table   is  based  on  334  obser- 
vations reporting  UI  receipt  who 
also   reported   amounts,    that   is, 
cases  of  nonresponse  have  been 
screened  out. 


Elderly 
Count 
(1000)        X 


Nonelderly  Total 

Count  Count 

(1000)        X  (1000) 


176.1 

VI 

1096.7 

IX 

1272.8 

338.6 

2 

746.3 

2 

1084.9 

290.0 

2 

2  305.6 

5 

2595.6 

308.1 

2 

1060.2 

2 

1368.3 

264.8 

1 

968.4 

2 

1233.2 

961.7 

5 

5991.9 

13 

6953.6 

648.6 

4 

1831.6 

4 

2480.2 

371.7 

2 

2597.4 

6 

2969.1 

848.8 

5 

4034.3 

9 

4883.1 

639.0 

3 

1191.9 

3 

1830.9 

237.0 

1 

1309.9 

3 

1546.9 

12  13187.7        72     21902.4       49       35090.1       55 

Total      18272.1      100     45036.6      100       63308.7      100 


Prepared  by  Mathematica  Policy  Research 
using   the   ISDP/RAMIS   II   system. 

This  table   is  based  on  5992  observa- 
tions,   1664  of   which  are   elderly. 
Persons  not   present   in  the  sample   the 
full  year  have  not  been  screened. 

TABLE   18 


Average  Monthly  Asset   Income 

Age 

Months 
Working 

Months 

Not 
Working 

Assets 

Months 
Working 

C   $3000 
Months 

Not 
Working 

Elderly 
Not  Elderly 
Total 

125 
22 
33 

70 
27 
32 

10 

5 
6 

10 
4 
5 

Unweighted  Number  of   Persons 
upon  which  Averages  are  Based 

Months 
Working 

Assets   < 

$3000 

Age 

Months 

Not 
Working 

Months 
Working 

Months 

Not 
Working 

Elderly 
Not 

Elderly 
Total 

138 

1,158 
1,296 

107 

929 
1,036 

75 

939 
1,014 

55 

753 
808 

Prepared  by  Mathematica  Policy  Research 
using   the   ISDP/RAMIS   II  system. 

This  table  consists  of  part-year  workers 
who  reported   receiving  asset  income   in 
both   periods   of  work  and   periods   of   non- 
work.      Persons   not    interviewed   the  full 
year  have  not  been  screened. 


DISTRIBUTION   OF  NON-ELDERLY   RECIPIENTS   OF 

OTHER    INCOME    BY    DURATION   OF    RECEIPT    BY    TYPE 

OF  OTHER   INCOME   RECEIVED 


Number 

of 

Irregi 

ilar 

Regular 
Count 

Months 

Count 

X 

X 

Receivi 

ng 

(1000) 

(1000) 

1 

4931. 5 

32 

954.3 

13 

2 

1825.1 

12 

547.3 

7 

3 

2316.3 

15 

788.1 

11 

4 

765.1 

5 

177.9 

2 

5 

1040.9 

7 

327.0 

4 

6 

384.1 

2 

524.2 

7 

7 

456.8 

3 

122.9 

2 

8 

259.0 

2 

161.4 

2 

9 

835.0 

5 

476.4 

6 

10 

129.3 

1 

156.6 

2 

11 

355.8 

2 

311.7 

4 

12 

2346.1 

15 

2801.5 

38 

TOTAL 

15645.0 

100 

7349.2 

100 

SOURCE:      Prepared    by   Mathematica   Policy   Research 
using  the   ISDP/RAMIS   II  system. 


CHANGE   IN  AVERAGE   MONTHLY  OTHER   INCOME 
ACROSS   PERIODS   OF   WORK  AND  NONWORK 


Average  Monthly  Other   Income 

'""""" "  ts   <    $3000 


Months 

Months  Not  Months  Not 

Working     Working         Working     Working 


Elderly 

330 

325 

302 

321 

Not   Eldei 

:ly        452 

348 

418 

346 

Total 

407 

339 

381 

338 

Unweighted   Numb' 

er  of   Persons 

upon 

which  Aver 

ages  are   Based 

Assets   < 

$3000 

Months 

Months 

Months 

Not 

Months 

Not 

Age 

Working 

Working 

Working 

Working 

Elderly 

204 

201 

137 

133 

Not 

Elderly 

349 

332 

306 

291 

Total 

553 

533 

44  3 

424 

SOURCE:      Prepared  by  Mathematica   Policy  Research 
using    the  ISDP/RAMIS    II  system. 
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The   findings   reported  in  this   paper  are   based 
on  research   carried   out   by  Tim  Carr,    Pat  Doyle 
and   Irene   Smith  Lubitz  as  a   part   of  MPR's  analy- 
sis of   participation  in  the  Food  Stamp  Program, 
pursuant    to  contract  No.    53-3198-0-101   with  the 
Food  and   Nutrition  Service,    USDA.      The  authors 
are  indebted   to  Steven  Carlson,    Harold  Beebout, 
Jim  Ohls   and   others    for  helpful   comments  on  the 
research  on  which  this   paper  is   based.      The  con- 
clusions  presented   here  are  solely   those  of    the 
authors  and  do  not  necessarily  reflect  the 
opinions   of    the  Food  and   Nutrition  Service   or 
Mathematics  Policy  Research. 

INTRODUCTION 

Researchers  and  policymakers  have  long  had  an 
interest  in  analyzing  the  factors  associated  with 
the  decision  of  eligible  individuals  and  house- 
holds to  participate  (or  not  to  participate)  in 
income  maintenance  programs.   Accordingly,  a 
large  and  growing  body  of  research  on  this  topic 
has  appeared  in  recent  years.   There  has  also 
been  considerable  interest  in  patterns  of  program 
participation  over  time  (i.e.,  entry  into,  and 
exit  from,  these  programs).   However,  there  have 
been  relatively  few  studies  of  the  longitudinal 
patterns  of  participation  (which  we  hereafter 
refer  to  as  "turnover"),  because  of  stringent 
data  requirements.   Such  studies  that  have 
appeared  have  usually  been  based  on  peculiar 
samples  (e.g.,  participation  in  negative  income 
tax  experiments). 

In  this  paper  we  report  the  findings  of  an 
analysis  of  turnover  in  the  Food  Stamp  Program 
based  on  data  from  the  1979  Income  Survey 
Development  Program  Research  Panel  (hereafter 
referred  to  as  the  ISDP  data).   The  ISDP  data 
base  was  a  precursor  of  the  forthcoming  Survey  of 
Income  and  Program  Participation,  and  displays 
many  of  the  same  attributes  that  will  make  SIPP 
especially  advantageous  for  research  on  turnover. 
As  such,  our  research  illustrates  the  potential 
uses  of  SIPP  for  research  in  this  area. 

THE  DATA 

A  successful  analysis  of   food  stamp  turnover 
requires   data   possessing  several  essential 
characteristics.      To  our  knowledge,    the  ISDP  is 
the  first   data  base   to  exhibit   all  of   these 
desirable   features   simultaneously.      These  charac- 
teristics are  as   follows: 

o  The  ISDP  is  nationally  representative,  and 
it  yields  enough  observations  (about  7,500 
households)   to   permit  meaningful  analysis. 

o     The   ISDP  is  longitudinal;   retrospective 
interviews  were  conducted  quarterly  for  a 
period  of  fifteen  months. 

o     Data  on  income   that  a  sample  household 
receives   from  various   sources   (including 
the   receipt  of   food  stamps)   is  ascertained 
on  a  month-by-month  basis.      Furthermore, 
the  exact   timing  of   all  changes   in  house- 
hold composition  between  interviews  is 
pinpointed.     These  data  requirements  are 


especially  crucial   to   the  analysis   of    turn- 
over  in  the   Food  Stamp  Program  because  many 
households   enter  and   exit   the   program   in 
the   space   of    less   than  a   year,   and   because 
program  participation  may   be   triggered   by 
events   such  as   a   temporary  drop   in  house- 
hold  income    that   could  not   be   picked  up  by 
a  survey   that   only  measures   a   household's 
annual    income. 

o  The  data  on  Income,  household  composition, 
and  assets  permit  us  to  simulate  eligibil- 
ity for  the  program,  using  the  eligibility 
criteria  actually  used  in    1979. 

o  There  is  a  wealth  of  explanatory  variables, 
such  as  household  and  individual  character- 
istics, types  of  income,  and-program  parti- 
cipation, available  to  the  analyst  for  both 
tabular  and  multivariate  analyses  of 
turnover. 

METHODOLOGY 

We   focus  primarily  on  three  measures   of   pro- 


o     The   entry   rate;    that   is,    the  probability 
that   a   household   that  was  not   receiving 
food   stamps   in  a  given  month  received   food 
stamps  the  next  month. 

o     The  exit   rate;    that   is,    the  probability 
that   a  household  that   received   food  stamps 
in  a   given  month  did   not   receive   food 
stamps   in  the  next  month. 

o     The  annual/monthly   ratio;    that   is,    the 
ratio  of   the   probability  that  a  household 
receives   food  stamps  over  the  course  of   a 
year   to  the  probability  that   it   receives 
food  stamps   in  a  given  month. 

Both  tabular  and  multivariate   techniques  were 
used   in  the  analysis.      The  multivariate  analysis 
used  the  RATE  model   for  the  analysis    of   event 
histories    that  has   been  developed   by  Nancy  Tuma 
and  her  colleagues   (Tuma  et   al. ,    1979),  and 
applied   in  studies   of  marital   instability,    unem- 
ployment  duration,   and  other   socioeconomic  pheno- 
mena.     Although   the   tabular  analysis  of   exit  and 
entrance  rates   focused  on  one   characteristic  at  a 
time,   making   it  difficult    to   sort   out  compo- 
sitional effects,    the  multivariate  analysis  in 
general  yielded   largely  similar  results. 

EMPIRICAL   FINDINGS 
Tabular  Analysis 

The  evidence  from  the  ISDP  panel,  via  tables 
and  multivariate  analysis,  is  that  turnover  in 
food  stamp  participation  is  high.  We  estimate 
the  ratio  of  annual  to  monthly  participation  at 
1.74,  indicating  that  the  number  of  households 
served  by  the  program  over  the  course  of  a  year 
is  about  70  percent  greater  than  the  caseload  in 
an  average  month  (Table  1).  To  illustrate  the 
implications  of  this  finding,  we  note  that  pro- 
gram data  for  1979  indicate  that  the  average 
monthly  caseload    in  1979  was   6.5  million 


households  (USDA,  1979).   This  annual-monthly 
ratio  implies  that  11.3  million  households — about 
14  percent  of  all  households — received  food 
stamps  sometime  during  1979. 

Most  of  the  food  stamp  households  observed  in 
our  data  received  food  stamps  for  only  part  of 
the  year.   About  two  thirds  of  the  sample  house- 
holds who  received  food  stamps  during  1979 
participated  in  the  program  for  less  than  the 
full  year  and  nearly  a  third  of  the  participants 
received  food  stamps  for  3  months  or  less  in 
1979.    Only  about  one-third  of  all  food  stamp 
recipient  households  observed  received  food 
stamps  "continuously"  (that  is,  for  all  months 
present  in  the  sample).   In  other  words,  a  truly 
"long-term  caseload"  may  account  for  only  about  a 
third  of  the  households  who  receive  food  stamps 
in  a  given  year. 

The  1979  sample  period  is  rather  short  for 
observing  individual  households'  food  stamp 
spells  over  time.   However,  even  during  this 
relatively  short  observation  period,  about  12 
percent  of  food  stamp  households  experienced  more 
than  one  spell   of  participation.   This  would 
seem  to  indicate  that  recidivism  in  the  Food 
Stamp  Program — households  returning  to  the 
program  after  not  participating  for  some 
interval — may  be  high. 

The  average  monthly  rate  of  exit  from  the  Food 
Stamp  Program  is  estimated  at  over  seven  percent. 
That  is,  in  a  given  month,  over  seven  percent  of 
the  caseload  may  be  expected  to  leave  the  program 
by  the  next  month.   The  exit  rate  in  a  given 
month  is  the  proportion  of  the  previous  month's 
caseload  (or  of  a  caseload  subset)  that  has  now 
left  the  program,  and  is  estimated  from  caseload 
counts  and  exits  from  the  program  in  each  month 
of  1979.   In  the  aggregate,  then,  a  substantial 
share  of  the  caseload  "turns  over"  each  month, 
with  perhaps  500  thousand  households   leaving  the 
program  and  being  replaced,  in  a  steady  state  of 
no  program  growth,  by  a  similar  number  of  new 
entrants.   (When  the  program  is  expanding, 
entrances  will  exceed  exits,  and  if  contracting 
exits  will  exceed  entrances.)   In  fact,  there  was 
significant  expansion  in  1979,  due  to  program- 
matic changes. 

The  program  entrance  rates  measure  inflows  to 
food  stamp  participation.   These  rates,  expressed 
relative  to  total  population,  are  much  lower  than 
exit  rates  but  in  fact  represent  flows  into  the 
program  that  approximately  equal  outflows 
(exits).   The  average  monthly  entrance  rate  as 
shown  in  Table  1  is  0.5%  per  month — that  is,  the 
average  probability  of  a  nonparticipant  in  a 
given  month  becoming  a  participant  in  the  next 
month  is  about  half  of  one  percent. 

Turnover  rates  in  the  Food  Stamp  Program, 
however  measured,  appear  to  be  quite  different 
for  different  kinds  of  households.   The  various 
measures  of  turnover  presented  for  different 
population  subgroups  indicate  that  the  more 
"permanent"  part  of  the  food  stamp  caseload 
includes  households  participating  in  AFDC  and 
other  welfare  programs,  and  elderly  households. 
A  more  transient  group  of  participants  includes 
younger  non-welfare  households  with  more  labor 
force  attachment  and  education. 


Multivariate  Analysis 

The  multivariate  analysis  of  turnover  in 
program  participation,  using  the  RATE  model, 
provides  estimates  of  the  independent  association 
of  household  characteristics  with  different 
turnover  rates.   The  results  of  estimating  our 
basic  model  of  transitions  to  and  from  partici- 
pation in  the  Food  Stamp  Program  are  presented  in 
Table  2.   The  precise  interpretation  of  these 
coefficients  is  not  entirely  straightforwatd,  as 
entry  and  exit  rates  are  complicated  functions  of 
the  coefficients.    Note  that  the  qualitative 
effect  of  an  explanatory  variable  on  entry  and 
exit  rates  is  indicated  by  the  sign  of  its 
coefficient,  just  as  would  be  the  case  in  the 
more  familiar  linear  regression  model.   For 
instance,  the  coefficient  of  the  elderly/disabled 
dummy  variable  is  positive  in  the  entry  model, 
and  negative  in  the  exit  model.   This  indicates 
that  households  containing  elderly  or  disabled 
persons  are  more  likely  to  enter  the  program,  and 
less  likely  to  exit  from  it,  ceteris  paribus. 

In  general,  the  results  are  consistent  with 
the  results  of  the  tabular  analysis  presented 
above,  in  that  the  household  characteristics  that 
appear  to  be  associated  with  high  entry  and  exit 
rates  on  the  basis  of  the  tabular  analysis  are 
also  those  that  appear  to  be  associated  with  high 
entry  and  exit  rates  on  the  basis  of  the  multi- 
variate analysis.   In  particular,  the  following 
findings  are  both  statistically  and  substantively 
significant. 

o  Nonwhite  households  who  are  not  in  the 
program  are  far  more  likely  to  enter  the 
program  in  any  given  month  than  otherwise 
similar  white  households;  furthermore,  non- 
white  households  that  are  receiving  food 
stamps  in  any  given  month  are  likely  to 
stay  on  the  program  longer  (i.e.,  have 
lower  exit  rates)  than  otherwise  similar 
white  households. 

o  Households  within  which  there  is  no 
currently  employed  person  are  both  more 
likely  to  enter  the  program  and  less  likely 
to  exit  the  program,  ceteris  paribus. 

o  Households  with  one  head  and  households 
with  an  elderly  or  disabled  person  tend  to 
stay  on  the  program  longer  than  other 
households,  all  other  things  being  equal. 

o  Households  that  receive  AFDC  are  both  more 
likely  to  enter  the  program,  and  less 
likely  to  leave,  than  otherwise  similar 
households. 

This  last  finding,  especially  the  higher  entry 
rate  of  AFDC  households,  is  especially  interest- 
ing because  it  has  been  hypothesized  by  some 
previous  researchers  that  there  is  a  "stigma" 
effect  that  acts  as  a  sort  of  psychological 
barrier  to  participation  in  income  maintenance 
programs  (e.g.,  see  Czajka,  1981).   These 
researchers  have  found  that  participation  in  one 
program  is  generally  correlated  with  participa- 
tion in  other  programs,  and  our  findings  tend  to 
confirm  theirs.   This  behavior  can  be  explained 
in  two  ways.  First,  it  may  be  the  case  that 
there  are  households  whose  members  are  psycholo- 
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gically  less  averse  to  receiving  welfare  than 
others,  and  hence  are  more  likely  to  apply  for 
benefits  from  all  programs.   Second,  a  household 
may  perceive  little  or  no  additional  stigma  from 
applying  for  and  receiving  benefits  from  other 
programs.   Of  course,  these  explanations  are  not 
mutually  exclusive,  and  it  is  difficult,  if  not 
impossible,  to  disentangle  them  with  the  data 
available  to  us. 

The  estimated  coefficients  of  the  RATE  model 
can  be  used  to  predict  monthly  entry  and  exit 
rates,  annual  participation  rates,  the  ratio  of 
annual  to  monthly  participation  rates,  and  the 
expected  duration  of  spells  of  participation  for 
a  hypothetical  household  with  any  combination  of 
characteristics.    In  order  to  make  the  impli- 
cations of  our  estimated  models  more  transparent, 
we  have  calculated  the  values  of  these  functions 
for  certain  combinations  of  characteristics. 

Specifically,  our  approach  to  this  presenta- 
tion is  as  follows.   First,  we  define  a  "base- 
line" household  that  has  characteristics  that  are 
fairly  typical:   a  white  household  with  two 
heads,  at  least  one  of  whom  is  employed,  and  no 
children.   Furthermore,  this  hypothetical  house- 
hold does  not  receive  AFDC,  nor  does  it  contain 
an  elderly  or  disabled  person.   We  have  calculat- 
ed predicted  monthly  entry  and  exit  rates  and 
other  measures  of  turnover  for  the  baseline 
household;  these  results  are  presented  in  the 
first  row  of  Table  3. 

The  numbers  in  the  other  rows  of  Table  3  are 
derived  by  altering  the  assumed  values  of  the 
explanatory  variables  one  by  one.   For  instance, 
the  row  labeled  "elderly/disabled"  pertains  to  a 
hypothetical  household  that  contains  an  elderly 
or  disabled  person,  but  is  otherwise  similar  to 
the  baseline  household  defined  alone,  and  so 
forth.   As  one  would  expect  based  on  the  results 
in  Table  1,  there  are  certain  identifiable  types 
of  "low-turnover"  households,  such  as  households 
with  an  elderly  or  disabled  person  and  households 
with  no  person  who  is  currently  employed,  that 
are  characterized  by  a  low  ratio  of  annual  to 
monthly  participation  rates  and  a  high  predicted 
duration  on  the  program. 

The  last  three  rows  illustrate  the  effect  on  a 
household  of  having  two  or  more  characteristics 
that  are  associated  with  low  turnover.   The  first 
two  of  these  rows  simulate  the  case  of  a  house- 
hold headed  by  a  single  person  and  containing  a 
child  who  is  under  6  (in  the  first  case)  and  over 
6  (in  the  second  case).   The  last  row  describes  a 
hypothetical  household  consisting  of  a  retired 
elderly  person  who  lives  alone.   Our  results 
imply  that  if  he/she  receives  food  stamps,  he 
will  be  on  the  program  for  an  average  of  over 
four  years  before  exiting,  several  times  longer 
than  the  expected  duration  of  participation  for 
the  population  as  a  whole. 


FOOTNOTES 


It 


These  estimates  are  based  on  households 
present  in  sample  for  the  full  calendar  year. 
However,  when  households  present  for  only  part  of 
the  year  are  included,  the  results  are  similar. 
Note  that  these  estimates,  although  illustrative 
of  caseload  composition  in  a  given  year,  do  not 


imply  estimates  of  average  duration  of  spell 
length  due  to  the  restricted  sample  period. 
Households  with  fewer  than  12  food  stamp  months 
in  1979  may  be  observed  during  spells  that  began 
before  or  ended  after  that  year. 

Estimate  based  on  "true  spells" — spells  of 
participation  separated  by  an  interval  in  which 
the  household  was  present  in  the  sample  but  not 
receiving  food  stamps. 

"^his  calculation  is  based  on  "true  exits" 
only;  where  a  true  exit  is,  generally  speaking, 
one  where  the  unit  remains  in  the  sample  but  is 
observed  to  be  no  longer  receiving  food  stamps, 
as  opposed  to  a  unit  who  leaves  the  sample 
following  some  period  receiving  food  stamps. 

This  estimate  is  based  on  an  average  monthly 
caseload  of  6.5  million  households  in  1979,  as 
indicated  by  program  data. 

5The  weighted  ISDP  counts  show  about  3.7 
million  entrances  and  3.2  million  exits  over  the 
course  of  1979,  consistent  with  the  observed 
increase  in  the  sample  caseload  for  that  period. 

The  manner  in  which  these  functions  were 
calculated  is  described  in  Carr  et  al.  (1984, 
Appendix  C). 

A  detailed  description  of  the  manner  in  which 
these  numbers  are  calculated  is  provided  in  Carr 
et  al.  (1984,  Appendix  C).  These  calculations 
assume  that  the  conditions  underlying  the  simple 
Markov  model  are  satisfied;  in  particular  that 
the  explanatory  variables  account  for  all  or  most 
systematic  variation  in  entry  and  exit  rates. 
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ESTIMATED  COEFFICIENTS   OF   A 

MODEL   OF    TURNOVER    I  N  FOOD 

STAMP  PARTICIPATION 


Independent  Variable 

Entry 

Model 

Exit  Model 

Constant 

-5.374 

-2.841 

Elderly/Disabled 

.132 

(0.92) 

-.683 

<-3.59)«« 

Nonantte 

1.601 

(14.89)«» 
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(-1.98)" 

Youngest  child  under  6 
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X2 
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Number  of   observations 
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667 

Source:      Calculated    by  Mathematics  Policy  Research    from    1979  ISDP  Pam 

Note:     Asymptotic  t  statistics  are   In  parentheses. 

•  Significant  at  .10  level  (one-tailed  test). 
••  Significant  at  .05  level  (one-tailed  test). 
•*•  Significant  at  .01    level    (one-tailed  test). 
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THE  MEASUREMENT  OF  HOUSEHOLD  WEALTH  IN  THE  SURVEY  OF  INCOME  AND  PROGRAM  PARTICIPATION 
Enrique  J.  Lamas  and  John  M.  McNeil,  U.S.  Bureau  of  the  Census 


I.  Introduction 

The  Survey  of  Income  and  Program  Parti- 
cipation (SIPP)  is  designed  to  obtain  current 
estimates  of  income,  labor  force  activity,  and 
participation  in  government  transfer  programs. 
Currently  available  survey  data  have  important 
limitations  which  have  been  well  recognized 
and  documented  [Yeas,  1982;  Yeas  and  Lininger, 
1982;  Nelson,  McMillen,  and  Kasprzyk,  1983]. 
The  March  supplement  to  the  Current  Population 
Survey  (CPS)  is  presently  the  major  source 
data  on  the  distribution  of  income.   In  the 
March  CPS,  household  members  at  the  time  of 
the  interview  are  asked  to  recall  their  income 
for  the  previous  calendar  year.  While  the  CPS 
survey  does  well  in  estimating  wage  and  salary 
income,  it  has  serious  underreporting  problems 
with  respect  to  property  income  (such  as  inter- 
est, dividend,  and  rental  income)  and  several 
other  income  types  (such  as  Supplemental  Secu- 
rity Income  (SSI),  and  worker's  compensation)*. 
The  CPS  has  other  limitations:   subannual  in- 
come estimates  are  not  available;  annual  house- 
hold income  estimates  cannot  be  adjusted  for 
changes  in  household  composition  or  size  during 
the  year,  for  example,  as  a  result  of  a  death, 
marriage,  or  divorce;  and,  information  impor- 
tant for  assessing  economic  well-being  of  the 
population  and  for  policy  analysis,  such  as 
assets,  taxes  and  other  characteristics,  are 
not  covered. 

SIPP  is  designed  to  overcome  many  of  these 
limitations.   Income  data  from  a  wide  variety 
of  sources  including  wage  and  salary  and  govern- 
ment transfer  programs  are  collected  on  a  mon- 
thly basis.  Changes  in  household  composition 
are  also  identified  on  a  monthly  basis.   In 
addition  to  improving  measures  of  income  and 
program  participation,  SIPP  will  also  obtain 
detailed  data  on  other  important  topics  inclu- 
ding asset  and  liabilities,  and  taxes.  Asset 
and  liability  data  are  important  in  determining 
program  eligibility  and  assessing  the  economic 
situation  of  families.   Information  on  the  dis- 
tribution of  household  wealth  is  important  and 
changes  in  wealth  provide  data  on  consumer 
savings.  SIPP  will  be  unique  among  surveys  in 
providing  a  recurring  series  on  household 
wealth.  The  focus  of  this  paper  is  to  describe 
the  effort  of  SIPP  to  provide  estimates  of 
household  wealth  holdings  in  the  United  States 
and  to  present  some   preliminary  results. 

Data  to  study  the  composition  and  distri- 
bution of  household  and  personal  wealth  have 
come  from  three  sources:  estate  tax  returns, 
synthetic  databases,  and  surveys  [Wolff,  1983], 
Each  source,  however,  has  its  limitations. 
Estate  tax  returns  consist  of  records  filed 
with  the  Internal  Revenue  Service  (IRS)  for 
estate  tax  assessments.  Coverage  of  the  popu- 
lation is  a  major  problem  for  this  data 
source.  Only  descendents  with  substantial 
wealth  holdings  ($300,000  or  more  in  1981  and 
$500,000  or  more  in  1982)  are  required  to  file 
estate  returns.  Under  certain  assumptions 
about  mortality,  an  "estate  multiplier  tech- 


nique" has  been  used  to  estimate  wealth  for 
living  individuals  who  would  have  been  required 
to  file  estate  tax  returns  [Schwartz,  1983]. 
However,  this  technique  has  limited  population 
coverage,  and  only  provides  estimates  for  top 
wealth  holders. 

Synthetic  databases,  such  as  the  Measurement 
of  Economic  and  Social  Performance  file  (MESP) 
[Wolff,  1983]  and  the  1973  Office  of  Tax  Analy- 
sis file  [Greenwood,  1983],  merge  data  from 
several  sources  in  order  to  get  appropriate 
population  and  asset  coverage.  The  MESP  file 
consists  of  a  statistical  match  of  the  1970 
census  1/1,000  public  use  sample  to  IRS  tax 
returns.  All  asset  values  were  imputed  from 
the  IRS  tax  information  for  financial  assets 
and  from  the  Consumer  Expenditure  Survey  (CES) 
for  consumer  durables.  The  Office  of  Tax 
Analysis  file  consists  of  a  statistical  match 
of  the  1973  CPS  to  tax  records  from  the  1973 
Individual  Income  Tax  Model.  The  IRS  data  on 
dividends  and  interest  are  used  to  estimate 
the  market  value  of  financial  assets  and  the 
IRS  data  on  property  taxes  are  used  to  estimate 
the  value  of  real  estate  holdings.  Net  wealth 
estimates  are  then  imputed  using  a  regression 
of  net  wealth  on  asset  values  based  on  IRS 
estate  tax  data.  Limiting  these  databases  are 
various  assumptions  which  underlie  the  matching 
procedures,  the  procedure  of  income  capitali- 
zation, and  the  extension  of  estimates  to  the 
whole  population  [Smith,  1983]. 

A  third  source  of  data  are  surveys  covering 
household  assets  and  liabilities.  Major  previ- 
ous surveys  include  the  Survey  of  Consumer 
Finances  (SCF),  the  Survey  of  Financial  Charac- 
teristics of  Consumers  (SFCC),  and  the  Income 
Survey  Development  Program  (ISDP).  The  SCF 
are  periodic  surveys  (the  latest  conducted  in 
1983)  with  sample  sizes  of  3,500  to  4,000 
consumer  units.  The  SFCC,  conducted  in  1962 
and  1963,  canvassed  2,557  units.   Information 
on  the  size  and  components  of  wealth  was 
collected  as  of  December  31,  1962  [Projector 
and  Weiss,  1966].  Both  the  SFCC  and  the  1983 
SCF  used  IRS  records  in  order  to  oversample 
high  income  households.  The  ISDP,  conducted 
in  1979  and  1980,  collected  information  on 
assets  and  liabilities  from  approximately 
7,000  households.  The  ISDP  was  a  research 
panel  designed  to  prepare  for  SIPP.  The  ISDP 
interviewed  respondents  on  a  quarterly  basis 
on  six  occasions.   In  the  fifth  interview,  a 
supplement  on  assets  and  liabilities  was  admin- 
istered which  collected  information  as  of 
December  31,  1979. 

Surveys  have  the  advantage  of  being  able 
to  have  samples  and  survey  instruments  specifi- 
cally designed  to  gather  the  necessary  detail 
and  information  to  estimate  wealth  holdings. 
Surveys, however,  have  suffered  from  limitations 
in  population  and  asset  coverage. 

In  the  next  section  of  this  paper,  signi- 
ficant features  of  the  SIPP  design  with  respect 
to  the  measurement  of  wealth  are  discussed.  The 
final  section  presents  results  from  the  first 


the  first  wave  of  SIPP. 

II.  SIPP  Design  Features 

SIPP  is  a  panel  survey  consisting  of  appro- 
ximately 20,000  households  which  are  interviewed 
every   four  months  for  a  period  of  2  1/2  years. 
As  SIPP  progresses,  new  panels  will  be  started 
every  year  which  will  allow  cross-sectional  an- 
alyses based  on  a  total  sample  of  approximately 
35,000  households.  At  each  interview,  informa- 
tion on  income,  program  participation,  and  other 
characteristics  for  each  of  the  previous  four 
months  is  obtained  for  each  person.  Persons  who 
move  during  the  life  of  the  panel,  are  followed 
and  interviewed  at  their  new  addresses.2  Ques- 
tionnaire items  include  a  "core"  set  of  ques- 
tions which  are  repeated  in  each  wave  of  inter- 
viewing. These  items  cover  labor  force  parti- 
cipation, detailed  income  recipiency,  and 
participation  in  government  programs.  For 
waves  2  through  8,  the  core  items  are  updated 
and  the  questionnaire  is  expanded  with  addi- 
tional questions  on  items  not  covered  in  the 
core.  Detailed  questions  concerning  the 
amounts  of  personal  and  household  asset  and 
liabilities  are  included  in  wave  4  which  is  to 
be  conducted  in  September  through  December 
1984.  These  items  will  be  updated  one  year 
later  in  wave  7.  Asset  and  liability  coverage 
is  comprehensive. 

The  SIPP  design  is  expected  to  have  a  posi- 
tive effect  on  the  ability  to  measure  wealth. 
Research  has  found  that  a  major  source  of  bias 
in  survey  estimates  of  wealth  is  the  nonre- 
porting  of  asset  ownership  [Ferber,  1982]. 
Additionally,  there  is  some  nonreporting  of 
asset  values.  Several  features  of  SIPP  are 
expected  to  have  positive  effects  on  the  repor- 
ting of  ownership  and  value  of  assets  [Radner 
and  Vaughan,  1984]. 

First,  asset  ownership  questions  precede 
questions  on  asset  values.  Separating  owner- 
ship and  amount  questions  helps  focus  on  iden- 
tification of  asset  holdings.  In  addition,  the 
the  relatively  sensitive  amount  questions  do 
not  negatively  impact  on  the  reporting  of  as- 
set ownership. 

Second,  the  longitudinal  nature  of  SIPP 
helps  identify  asset  ownership.  Asset  owner- 
ship information  is  collected  in  each  wave. 
In  the  initial  interview,  a  set  of  detailed 
questions  designed  to  identify  ownership  of 
income  earning  assets  are  asked  for  each  person 
in  the  household.  An  asset  roster  is  created 
and  recorded  in  the  control  card.  In  subsequent 
interviews,  the  respondent's  asset  roster 
for  the  previous  wave  is  pretranscribed  to  the 
current  questionnaire.  During  an  interview, 
the  pretranscribed  asset  roster  1s  checked  for 
accuracy.  Then  questions  are  asked  to  determine 
whether  any  assets  have  been  liquidated  or 
whether  any  new  ones  have  been  acquired.  With 
this  procedure,  relatively  accurate  asset 
ownership  Information  is  obtained  before  re- 
spondents are  asked  about  asset  values  and 
amounts  of  liabilities  1n  wave  4. 

In  longitudinal  surveys,  attrition,  that 
is  dropping  out  of  the  sample  is  of  concern. 
While  some  respondents  leave  the  sample,  there 
is  evidence  to  suggest  that  cooperation  or 
rapport  obtained  in  repeated  interviews  increase 


the  reliability  of  financial  data  [Ferber  and 
Frankel ,  1978].  Furthermore,  the  longitudinal 
nature  also  provides  the  opportunity  to  gather 
information  missed  in  a  previous  interview. 
If  one  interview  is  missed  for  the  household 
or  an  individual  respondent,  a  "Missing  Wave" 
section  is  completed.  In  this  section,  a 
limited  set  of  key  questions  are  asked  concern- 
ning  labor  force  participation,  income  recipi- 
ency, and  asset  ownership  during  the  missed 
wave. 

Third,  ownership,  income,  and  value  of 
assets  are  asked  for  each  individual  by  type 
of  ownership.  Information  is  gathered  for 
assets  held  jointly  with  spouse  and  for  assets 
held  individually.  There  is  some  evidence 
that  nonreporting  of  ownership  is  higher  for 
assets  held  by  one  individual  as  compared  to 
assets  held  jointly  with  others  [Ferber  and 
Frankel,  1978].  Asking  respondents  by  type  of 
ownership  directly  may  tend  to  reduce  differen- 
tial nonresponse  between  assets  held  in  own 
name  and  jointly  with  others.  In  addition, 
collecting  income  and  asset  values  by  type  of 
ownership  rather  than  one  total  amount  for  an 
asset  may  tend  to  give  more  accurate  income 
and  asset  value  amounts.  Because  both  asset 
income  and  asset  value  are  collected,  informa- 
tion about  asset  income  can  be  used  for  asses- 
sing the  reasonableness  of  the  asset  value 
data  and  for  imputing  missing  values. 

Finally,  "callback  items"  have  been  intro- 
duced for  critical  questions  concerning  asset 
values.  Callback  items  are  designed  to  reduce 
nonreporting  of  income  and  asset  amounts.  For 
selected  items,  when  a  respondent  answers 
"don't  know,"  the  interviewer  reads  a  statement 
on  the  importance  of  the  information  requested 
and  the  respondent  is  asked  whether  it  would 
be  possible  to  call  back  later  for  an  estimate 
of  the  amount.  For  respondents  agreeing  to  be 
called  back,  a  "reminder  card"  is  provided  to 
the  respondent  with  the  requested  information 
checked.  The  interviewer  telephones  the  respon- 
dent at  an  agreed  upon  time  to  obtain  the 
missing  information  which  is  entered  in  a 
section  of  the  questionnaire  reserved  for 
callback  amounts.  The  impact  of  the  callback 
system  is  to  reduce  the  incidence  of  missing 
information.  Since  callbacks  involve  added 
respondent  and  interviewer  burden,  only  a 
limited  number  of  items  can  be  identified  as 
callback  items.  To  supplement  the  callback 
procedure,  special  instructions  have  been 
included  for  other  important  questions  which 
instruct  interviewers  to  probe  for  an  estimate 
before  accepting  a  "don't  know"  response.  In 
this  way,  interviewers  are  alerted  to  key 
items  for  which  they  should  give  special  effort 
to  obtain  estimates. 

An  important  issue  for  surveys  which  mea- 
sure wealth  is  population  coverage.  Evidence 
suggests  that  wealth  holdings  are  concentrated. 
Studies  have  estimated  that  the  top  5  percent 
of  household  hold  approximately  30  to  50  per- 
cent of  net  wealth  [Wolff,  1983;  Greenwood, 
1983;  and  Smith,  1983].   In  addition,  holdings 
of  certain  assets  such  as  stocks  are  even  more 
highly  concentrated  with  the  top  5  percent 
holding  approximately  70  percent  of  this  asset. 
As  a  result,  the  normal  SIPP  area  frame  sample 
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has  a  limited  coverage  of  the  top  wealthholders. 
This  problem  was  noted  by  the  Census  Advisory 
Committee  on  Population  Statistics  which  recom- 
mended that  "the  Bureau  explore  the  opportunity 
to  augment  the  SIPP  sample  design  every  5  years 
to  oversample  the  upper  end  of  the  income  and 
wealth  distribution,  where  special  effort  would 
be  directed  to  producing  a  reliable  indication 
of  the  entire  distribution  of  wealth"  [U.S. 
Bureau  of  the  Census,  1983]. 

The  Census  Bureau  has  done  some  preliminary 
work  to  explore  the  methods  which  might  be 
used  to  obtain  data  files  in  which  top  wealth- 
holders  are  adequately  represented.  One  possi- 
bility is  to  use  IRS  data  files  to  develop  a 
list  frame  sample  of  top  income  holders.  An 
alternative  approach  is  to  use  estate  tax 
data  to  estimate  wealth  distribution  of  top 
wealthholders.  Using  estate  tax  data  in  con- 
junction with  SIPP  data  has  the  potential  to 
improve  population  coverage.  At  this  time, 
however,  the  work  in  this  area  is  very  preli- 
minary.  In  any  case,  the  probable  limitations 
of  SIPP  wealth  estimates  and  the  need  to  im- 
prove population  coverage  have  been  recognized 
by  the  Census  Bureau  and  the  work  to  improve 
the  estimates  continues. 

III.  Data  and  Results 

A.  Nonresponse  Rates~for  Asset  Ownership 

Ownership  of  assets  is  established  in  the 
first  wave  of  SIPP  and  an  asset  roster  is 
constructed  for  each  respondent.  This  roster 
is  verified  and  updated  in  subsequent  waves. 
The  amounts  of  assets  held  are  not  systemati- 
cally covered  until  the  fourth  wave  topical 
module.   In  Table  1,  the  level  of  missing 
information  on  asset  ownership  are  shown. 
Respondents  can  answer  that  they  "don't  know" 
if  they  own  a  specific  asset  type  or  they  can 
refuse  to  answer.  The  nonresponse  rate  for 
all  asset  types  is  low  at  1.4  percent  of  all 
persons  asked  about  asset  ownership.  For 
specific  asset  types,  the  rates  differ  and 
range  from  0.9  percent  for  rental  property  and 
royalties  to  2.2  percent  for  certificates  of 
deposit.4   When  the  nonresponse  rates  are 
decomposed,  the  refusal  rates  are  generally 
higher  than  the  "don't  know"  rates.  This 
result  is  true  of  self  respondents.   In  general, 
self  respondents  have  higher  refusal  than 
"don't  know"  rates.  However,  as  would  be 
expected,  the  frequency  of  "don't  know"  re- 
sponses for  asset  ownership  is  greater  for 
proxy  respondents  which  range  from  0.3  to  1.7 
percent,  as  compared  to  self  respondents  which 
range  from  0.1  to  0.3  percent.  The  refusal 
rates  for  both  types  of  respondents  were  simi- 
lar (ranging  from  0.7  to  1.5  percent).  As  a 
result  of  higher  "don't  know"  rates,  proxy 
respondents  had  somewhat  higher  overall  nonre- 
sponse rates.  For  both  types  of  respondents, 
however,  the  absolute  level  of  the  nonresponse 
rates  are  low. 

To  examine  the  asset  ownership  information 
further,  nonresponse  rates  by  demographic  and 
socioeconomic  characteristics  are  shown  in 
Table  2.  Several  patterns  emerge.  Nonre- 
sponse rates  for  each  asset  type  are  appro- 
ximately the  same  between  sex  or  race  groups. 
Some  of  the  rates,  however,  are  significantly 


different  by  age  and  education  levels.  The 
nonresponse  rates  for  each  asset  type  increase 
with  the  age  of  the  respondent.  For  respon- 
dents in  the  less  than  25  and  25  to  34  age 
groups,  nonresponse  rates  range  between  0.3 
to  1.2  percent;  for  the  35  to  44  age  group, 
rates  range  from  0.9  to  2.0;  for  the  '45  to  54 
and  54  to  64  age  groups,  rates  range  from  1.1 
to  2.8;  and  for  the  65  and  older  age  group, 
rates  range  from  1.3  to  3.5  percent.  While 
the  rates  for  the  oldest  age  group  are  signifi- 
cantly higher  than  for  the  youngest  group 
for  each  asset  type,  it  should  be  noted  that 
even  the  highest  rates  (3.4  and  3.5  percent 
for  money  market  deposit  accounts  and  certifi- 
cates of  deposit,  respectively)  are  relatively 
low. 

Nonresponse  rates  also  differ  by  education 
level.  The  nonresponse  rates  for  ownership  of 
several   asset  types  is  higher  for  college 
graduates  than  for  respondents  who  did  not 
complete  high  school  or  who  completed  some 
college.  Nonresponse  rates  for  college  gradu- 
ates range  from  1.3  to  2.6  percent,  while  the 
rates  for  respondents  with  less  than  high 
school  or  some  college  education  levels  range 
from  0.5  to  1.8  percent.  Differences  in  rates 
between  these  groups  are  significant  for  each 
detailed  asset  types,  except  saving  accounts 
and  interest  earning  checking    accounts. 

B.  Asset  Ownership  Patterns 

Analyzing  the  SIPP  asset  ownership  results, 
several   patterns  emerge.  The  frequency  of 
asset  ownership  is  shown  in  Table  3.  The  most 
frequently  held  assets  were  savings  accounts 
and  home  ownership  with  65.7  and  64.1  percent 
of  households  reporting  ownership,  respec- 
tively. The  ownership  of  the  remaining  assets 
was  reported  by  approximately  20  percent  of 
households,  with  rental  properties  and  royal- 
ties reported  by  10  percent  or  less  of  house- 
holds. Of  interest  are  the  asset  types  which 
became  newly  available  after  1982  as  a  result 
of  deregulation  in  the  banking  industry. 
Approximately  23  percent  of  the  households 
reported  ownership  of  one  or  more  interest 
earning  checking  accounts,  while  17  percent  of 
the  households  reported  ownership  of  money 
market  deposit  accounts. 

The  SIPP  wave  1  results  for  selected  assets 
are  compared  to  asset  ownership  data  from 
other  surveys  in  Table  4.   In  general,  the 
ownership  estimates  obtained  in  wave  1  of  SIPP 
are  similar  to  estimates  derived  from  other 
sources.  The  only  major  difference  occurs  in 
savings  accounts.  The  Consumer  Credit  Survey 
and  the  ISDP  results  show  approximately  75 
percent  of  households  reporting  ownership  of 
savings  accounts,  while  SIPP  results  show 
approximately  66  percent  of  the  households 
owning  such  accounts.  A  plausible  explanation 
is  that  savings  accounts  are  highly  substi- 
tutable  with  the  newly  available  assets. 
Individuals  have  an  incentive  to  switch  from 
savings  accounts  to  new  accounts  for  the 
greater  liquidity  available  with  interest 
earning  checking  accounts  and  for  the  higher 
interest  rates  available   with  money  market 
deposit  accounts.  The  result  is  a  negative 
impact  on  the  percent  of  households  owning 
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savings  accounts. 

C.  Asset  Amount  Response 

An  important  feature  of  the  SIPP  design 
is  to  have  income  recipiency  and  asset  owner- 
ship questions  precede  income  and  assest  a- 
mounts  questions.  Once  income  recipiency  is 
established  for  a  respondent,  questions  on 
amount  of  income  are  asked  by  income  type. 
Income  from  assets,  e.g.  interest,  dividends 
and  rental  income,  are  covered  in  each  wave. 
This  section  focuses  on  the  reporting  of  pro- 
perty income  and  asset  amounts  in  the  first 
wave  of  SIPP. 

The  questions  on  asset  income  are  divided 
by  type  of  ownership,  that  is,  assets  held 
jointly  with  spouse,  and  assets  in  own  name  or 
held  jointly  with  others.  Persons  identified 
as  holding  an  asset  are  asked  questions  con- 
cerning the  amount  of  income  received.   If 
respondents  cannot  provide  the  amount  of  in- 
terest earned,  the  average  balances  of  accounts 
are  asked  in  order  to  impute  the  interest 
income  received.  A  callback  is  provided  for 
the  balance  amount  if  the  person  does  not  know 
the  amount  at  the  time  of  the  interview  but 
can  provide  an  estimate  later.  Unlike  the 
ownership  items,  questions  on  interest  amounts 
refer  to  grouped  assets.   In  particular,  assets 
held  with  financial   institutions  (savings 
accounts,  money  market  deposit  accounts,  certi- 
ficates of  deposit,  and  interest  earning  check- 
ing accounts)  are  grouped  together  and  the 
total  interest  earned  is  asked.  Similarily, 
other  interest  earning  assets  (money  market 
funds,  U.S.  Government  securities,  municipal 
or  corporate  bonds,  and  other  interest  earning 
assets)  are  grouped  and  interest  income  from 
those  sources  is  covered. 

The  reporting  of  interest  income  is  sum- 
marized in  Table  5.  On  the  average,  two-thirds 
of  respondents  reported  an  amount  of  interest 
earned.  Approximately  15  to  20  percent  of 
respondents  did  not  know  the  interest  income 
amount,  but  provided  an  estimate  of  the  total 
balance  in  the  accounts.   In  general,  over  80 
percent  of  respondents  gave  the  amount  of 
interest  or  the  balance  in  the  accounts.  Less 
than  11  percent  did  not  know  the  amount  of 
interest  and  did  not  provide  a  balance  amount. 
Only  5  to  8  percent  of  respondents  refused  to 
report  the  interest  income. 

These  patterns  of  reporting  did  not  differ 
by  type  of  ownership,  that  is,  joint  with 
spouse  versus  in  own  name  or  with  others. 
However,  the  frequency  of  persons  who  refused 
or  did  not  provide  an  interest/balance  amount 
was  lower  on  average  for  assets  at  financial 
institutions  (11.7  percent)  than  other  interest 
earning  assets  (17.1  percent). 

Of  interest  are  the  cases  which  did  not  know 
the  amount  of  interest  earned  and  which  were 
asked  the  average  balance  held  for  imputation 
purposes.  Over  5,900  respondents  were  asked 
the  balance  of  accounts  held  with  financial 
institutions,  and  455  were  asked  the  balance 
of  other  interest  earning  accounts.  For  assets 
held  with  financial  institutions,  approximately 
78  percent  of  the  respondents  which  reported  they 
did  not  know  the  interest  income  earned  were 
resolved  using  the  average  balance  reported; 


for  other  interest  earning  assets,  67  percent 
of  the  "don't  know"  cases  reported  a  balance 
amount.  Only  6  percent  of  the  respondents 
refused  to  report  an  amount. 

IV.  Conclusion 

The  SIPP  will  provide  annual  estimates  of 
wealth.   Information  on  asset  and  liabilities 
is  useful  for  many  types  of  analyses,  including 
program  eligibility  simulation  studies  and 
measurements  of  the  distribution  of  wealth. 
In  this  paper,  significant  features  of  the 
survey  were  presented  along  with  some  prelimi- 
nary results  from  SIPP.  Several  design  features 
of  SIPP,  including  longitudinal  nature  of  the 
survey  design,  separating  asset  ownership  and 
asset  value  questions,  updating  asset  roster 
each  interview,  callbacks,  probe  instructions, 
and  a  missing  wave  section,  are  likely  to  have 
a  positive  effect  on  the  reporting  of  asset 
ownership.  In  addition,  asset  and  liability 
coverage  is  comprehensive.  Results  from  the 
first  wave  of  SIPP  show  that  nonresponse 
rates  for  asset  ownership  are  low.   In 
addition,  frequency  of  ownership  patterns 
are  reasonable  and  the  results  are  comparable 
to  findings  from  other  surveys.  Preliminary 
results  also  show  some  indication  of  improve- 
ments in  nonresponse  rates  for  items  on  asset 
amounts. 

FOOTNOTES 


1  When   comparing  CPS  income  aggregates  to 
independent  benchmark  estimates,  CPS  captures 
97.4  percent  of  independent  wage  and  salary 
totals.  However,  the  CPS  only  covers  41.5 
and  44.1  percent  of  interest  and  dividend 
totals,  69.4  percent  of  SSI  totals,  and  42.3 
percent  of  worker's  compensation  totals. 

2  Persons  who  move  individually  or  in  groups 
are  followed  if  they  relocate  within  100 
miles  of  any  SIPP  PSU.  Persons  who  move 
into  institutions  are  not  interviewed  if 
they  are  institutionalized. 

3  Home  equity,  automobiles,  and  life  insurance 
information  were  collected  in  wave  2,  pri- 
marily to  evaluate  government  program  eligi- 
bility. 

4  Significance  tests  were  performed  on  the 
differences  between  rates  using  the  formula 

(Pl-P2) 


Pl(l-Pl)^ 
Nl 


P2(l-P2) 
N2 


where  ?\   and  ?2   are  the  proportion  of  nonre- 
sponses,  F\     and  F2  are  sample  design 
factors,  and  Nj  and  Ng  are  the  number  of 
sample  cases  for  each  proportion. 
Differences  noted  in  the  text  are  significant 
at  the  95  percent  confidence  level. 
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Table  3-— Percent  of  House  holds  Reporting  Ownership 
Asset  Types1 


Asset  type 

SIPP  Wave  1 

65-7 
17-1 
19-0 
23-1 
18-1 
20-9 
10.2 
1.4 
64.1 

Money  market  deposit  accounts  

Interest  earning  checking  accounts  ... 
Other  interest  earning  assets^  

1  Preliminary  SIPP  wave  1  results. 

2  Includes  money  market  funds,  U.S.  Government  securities, 
municipal  or  corporate  bonds,  mortgages,  and  U-S.  Savings  Bonds- 


Table  4. --Percent  of  Households  Reporting  Ownership  of  Selected  Assets 


Asset  Type 


SIPP 

WAVE  1 

{mi) 


ISUP 
(19/9) 


Consumer 
Credit 
Survey 
(1977) 


Survey  of 
Financial 
Characteristics 
of  Consumers 
(1962-65) 


Savings  accounts  

Certificates  of  Deposit  

Stocks  or  Mutual  Fund  Shares 

Rental  Property  

Home  ownership  

Sample  size  


65-7 
19-0 
20-9 
10-2 
64.1 
19,878 


73-8 
15-7 
19-6 
13-6 
60-5 
6,992 


76-9 
13-6 
24-7 

61-0 
2,563 


59-0 
16-0 


57-0 
2,557 
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THE  WEALTH  AND  INCOME  OF  AGED  HOUSEHOLDS 


Daniel  B.  Radner,  Social  Security  Administration 


I.   Introduction* 

The  economic  status  of  the  aged  population 
has  been  a  topic  of  Interest  to  researchers  for 
some  time,  and  there  has  been  particular 
Interest  In  recent  years.   In  this  paper  we  are 
Interested  in  the  economic  resources  of  aged 
households.   This  topic  is  examined  using  data 
on  both  Income  and  wealth.   Estimates  are 
presented  and  compared  for  all  aged  households 
and  for  aged  households  with  both  low  income  and 
low  wealth.   Socioeconomic  characteristics,  mean 
and  median  amounts,  and  the  composition  of 
wealth  and  income  are  discussed. 

Many  different  indicators  of  economic  well- 
being  have  been  used  by  researchers.   Income, 
specifically  money  income  before  taxes,  is  the 
most  frequently  used  measure  of  economic  well- 
being.   Money  income  before  taxes  is  also  the 
definition  of  income  used  here.   Of  course,  this 
definition  is  far  from  an  ideal  measure  of 
economic  well-being.  An  important  exclusion  is 
noncash  income,  which  is  a  significant  source  of 
economic  resources  for  most  groups  of  the 
population.   Also,  by  using  pre-tax  income,  the 
resources  available  to  the  unit  can  be  distort- 
ed.  These  deficiencies  can  affect  comparisons 
between  nonaged  and  aged  households. 

Measures  of  economic  well-being  that  are  con- 
fined to  Income  omit  the  wealth  of  the  unit, 
although  income  from  assets  ordinarily  is 
Included  in  income.   A  relatively  new  data  base, 
the  1979  Income  Survey  Development  Program 
(ISDP)  file,  makes  the  examination  of  both 
income  and  wealth  possible.  1/      Wealth  and 
income  data  can  be  used  together  in  several 
different  ways.   In  this  paper  a  simple  approach 
is  used — income  and  wealth  are  used  as  a  two- 
dimensional  classification.  This  approach  does 
not  spread  the  wealth  out  over  the  expected 
lifetime  of  the  unit  as  some  methods  (e.g., 
Weisbrod  and  Hansen  1968)  do,  but  is  concerned 
with  s  much  shorter  time  horizon.    In  this 
application,  the  liquidity  of  the  assets  held 
can  be  very  important.   Although  the  most 
comprehensive  definition  available  (net  worth) 
is  used  here  as  the  measure  of  wealth,  the 
composition  of  wealth  is  examined  so  that 
liquidity  can  be  taken  into  account. 

In  a  recent  paper,  Radner  and  Vaughan  (1984) 
examined  the  Joint  distribution  of  wealth  and 
income  for  various  age  groups  and,  using  several 
different  definitions,  looked  at  the  proportion 
of  households  with  both  relatively  low  wealth 
and  relatively  low  income.   They  showed  that, 
despite  the  fact  that  average  net  worth  is 
yf airly  high  for  aged  households,  the  joint 
distribution  of  income  and  wealth  is  such  that 
the  percentage  of  aged  households  (age  65  or 
older)  with  both  relatively  low  income  and 
relatively  low  net  worth  exceeds  the  percentage 
for  each  age  group  In  the  25-64  age  range. 

Aged  households  identified  using  one  of  those 
definitions  of  low  income  and  low  wealth  are 
examined  further  in  this  paper.  The  particular 
definition  used  is  that  the  household  had  to  be 
In  the  bottom  20  percent  of  all  households 


ranked  by  size  of  income  and  in  the  bottom  40 
percent  of  all  households  ranked  by  size  of  net 
worth.   For  both  Income  and  net  worth  the 
rankings  were  based  on  amounts  that  had  been 
adjusted  for  size  of  household  using  an  equiva- 
lence scale  derived  from  the  U.S.  poverty  lines. 

All  estimates  shown  in  this  paper  are  on  a 
household  basis.   For  convenience  it  is  assumed 
that  the  income  and  wealth  of  the  household  are 
resources  for  all  members  of  the  household  and 
only  for  members  of  the  household.   Thus,  rela- 
tives of  persons  in  the  household  who  do  not 
live  in  the  household  (e.g.,  parents  or 
children)  are  assumed  to  have  no  claim  on  the 
resources  of  the  household,  and  the  household  is 
assumed  to  have  no  claim  on  other  resources. 
Some  of  the  estimates  shown  in  this  paper  take 
into  account  the  number  of  persons  in  the 
household. 

Because  the  characteristics  and  economic 
situations  of  aged  households  can  differ 
substantially  by  age,  estimates  a,re  shown  for 
detailed  age  groups  within  the  aged  group. 
Space  limitations  and  small  sample  sizes 
restrict  the  age  detail  that  can  be  shown. 

Section  II  briefly  describes  the  data  used 
and  presents  definitions  of  Important  concepts 
used  in  the  paper.   Section  III  provides  a  brief 
overview  of  the  income  and  wealth  of  households 
in  different  age  groups.   In  Section  IV  charac- 
teristics of  all  aged  households  and  of  aged 
households  with  both  relatively  low  income  and 
relatively  low  wealth  are  presented  and  dis- 
cussed. A  summary  and  conclusions  are  presented 
in  Section  V. 

II.   Data  and  Definitions 

The  data  used  in  this  paper  are  from  the  1979 
ISDP  file  (Radner  and  Vaughan  1984).   The  sample 
was  nationally  representative  and  both  the  low 
and  high  ends  of  the  income  distribution  were 
oversampled  slightly.   Detailed  information  on 
Income,  assets,  debts  and  socioeconomic  char- 
acteristics was  obtained.   The  data  used  in  this 
paper  are  primarily  from  the  fifth  wave  of  this 
multi-wave  panel.   The  fifth  wave,  which  has 
most  of  the  data  on  wealth,  contains  observa- 
tions on  about  6,900  households  resulting  from 
interviews  in  January,  February,  and  March  1980. 

The  ISDP  income  data  suffer  from  the  under- 
reporting which  is  common  to  household  surveys. 
However,  overall  the  ISDP  data  appear  to  be 
better  than  the  income  data  in  the  Current 
Population  Survey  (CPS).   The  quality  of  the 
wealth  data  Is  difficult  to  assess,  although  it 
is  known  that  ndsreporting  and  nonresponse  are 
substantial  problems.   While  there  is  some 
evidence  of  marked  increases  in  estimates  of 
asset  ownership  compared  to  other  surveys, 
unfortunately  item  nonresponse  on  asset  values 
(for  assets  other  than  owner-occupied  housing 
and  vehicles)  Is  quite  high.   Thus,  very  sub- 
stantial proportions  of  the  final  asset  value 
aggregates  stem  from  values  assigned  using  "hot 
deck"  imputation.   This  is  a  problem  that  should 
be  kept  in  mind  in  interpreting  the  estimates 
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shown.   The  poet-Imputation  estimates  of  net 
worth  and  of  most  asset  types  used  in  this  paper 
suffer  from  substantial  underreporting.   Finan- 
cial assets  appear  to  show  the  highest  percen- 
tage of  underreporting.   However,  a  rough 
comparison  of  survey  aggregates  to  independent 
control  aggregates  suggests  results  that  are 
similar  to  the  1962  Survey  of  Financial 
Characteristics  of  Consumers  (Projector  and 
Weiss  1966).   It  should  be  noted  that  the  ISDP 
estimates  of  the  extreme  upper  tail  of  the  net 
worth  distribution  show  a  far  lower  share  of  net 
worth  than  Is  shown  by  other  sources  of  wealth 
data.   This  "absence"  of  the  extreme  upper  tail 
results  at  least  in  part  from  an  emphasis  in  the 
survey  on  obtaining  data  for  low-  and 
middle—income  units. 

In  general  the  demographic  concepts  used  in 
this  paper  are  the  same  as  or  similar  to  those 
employed  by  the  Bureau  of  the  Census  in  its 
Annual  Demographic  Supplement  to  the  CPS.  If 
The  data  presented  pertain  to  the  civilian 
noninstltutional  population  living  in  the  50 
states  and  the  District  of  Columbia  at  the  dates 
of  Interview.  All  estimates  are  on  a  household 
basis.  Age  classifications  are  based  on  the  age 
of  the  household  reference  person  (householder) 
at  his  or  her  most  recent  birthday.   The  house- 
holder is  the  person  (or  one  of  the  persons)  In 
whose  name  the  home  is  owned  or  rented. 

Income  is  defined  on  a  before-tax  basis  and 
Is  presented  at  annualized  rates;  that  is,  as 
the  measured  three-month  value  times  four.   All 
money  income  received  by  household  members 
during  the  three  months  preceding  the  month  of 
Interview  is  covered,  Including  one-time  or  lump 
sum  payments  such  as  life  insurance  proceeds  and 
gifts.   Total  money  Income  consists  of  the  sum 
of  earnings,  property  income,  OASDI,  SSI,  pen- 
sions, and  other  income.  3/      Earnings  includes 
wages  and  salaries,  net  Income  from  farm  self- 
employment,  and  "draw"  from  nonfarm  self- 
employment.  4/   Property  Income  Includes 
Interest,  dividends,  rent,  royalties,  and  income 
from  estates  and  trusts.   OASDI  consists  of 
social  security  and  railroad  retirement  Income. 
SSI  consists  of  Supplemental  Security  Income, 
including  both  state  and  federal  amounts. 
Pensions  are  the  sum  of  government  and  private 
pensions.   Government  pensions  consists  of  U.S. 
civil  service  and  military  retirement  pensions 
and  state  and  local  government  pensions. 
Private  pensions  Includes  employer  and  union 
pensions  and  annuities.   Other  income  Includes 
unemployment  compensation,  worker's  compensa- 
tion, veterans'  compensation  and  pensions, 
public  assistance,  educational  benefits, 
alimony,  assistance  from  relatives  and  friends, 
and  lump  sum  payment b. 

The  wealth  concept  used  Is  net  worth.   Net 
worth  consists  of  all  assets  less  all  debts 
covered  by  the  survey.   Net  worth  is  defined  to 
be  wealth  minus  unsecured  debt.   All  net  worth 
components  except  home  and  vehicle  equity  are 
valued  as  of  the  end  of  1979;  those  two 
components  are  valued  as  of  mid-1979.  5/     Wealth 
is  defined  as  the  value  of  all  assets  covered  by 
the  survey  less  any  debts  secured  by  those 
assets.   Several  items  sometimes  included  in 
estimates  of  wealth  are  not  covered;  for 
example,  social  security  wealth,  pension  wealth, 


and  trusts  are  excluded. 

Wealth  is  the  sum  of  the  following  items: 
home  equity  (owner-occupied);  durable  goods 
(equity  in  vehicles  plus  market  value  of  house- 
hold durables);  business  equity  (nonfarm  and 
farm);  liquid  financial  assets  (cash,  checking 
accounts,  passbook  savings  accounts,  U.S. 
savings  bonds);  nonliquid  financial  assets 
(bonds,  CD's,  stocks,  mutual  fund  shares);  other 
assets  (e.g.,  equity  in  other  property,  equity 
in  nonactive  business  Interest).   Unsecured  debt 
includes  Installment  and  nonlnstallment  debt, 
unpaid  medical  bills,  and  educational  loans. 

III.   General  Patterns  of  Income  and  Wealth  by 
Age  of  Householder 

In  this  section  an  overview  of  the  relation- 
ship between  age  and  income  and  wealth  is 
presented.   Mean  and  median  amounts  are  shown, 
and  the  dispersion  of  the  Income  and  wealth  of 
aged  households  is  discussed.   Mean  and  median 
total  money  Income  and  net  worth  are  shown  for 
age  (of  householder)  groups  in  Table  1 .  6/  7/ 
Looking  at  income,  these  estimates  show  the 
familiar  pattern  of  relatively  low  mean  and 
median  at  young  and  old  ages,  with  a  peak  in  the 
45-54  age  group.   The  mean  (median)  for  the  65 
and  over  group  is  below  the  mean  (median)  for 
the  youngest  age  group.   Of  course,  the  median 
is  below  the  mean  for  every  age  group.   The 
pattern  for  net  worth  is  somewhat  different. 
Mean  amounts  are  low  for  the  younger  age  groups, 
peak  In  the  55-64  group,  and  then  decline. 
However,  the  mean  (median)  for  the  65  and  over 
group  is  far  above  the  mean  (median)  for  the 
three  youngest  age  groups.   In  general,  the 
median  is  about  one  half  the  mean  in  each  age 
group.   Within  the  detailed  aged  groups,  both 
income  and  net  worth  decline  as  age  increases, 
using  both  means  and  medians,  with  the  75  and 
over  group  substantially  below  the  65-69  group. 
One  way  of  looking  at  the  mean  amounts  is  to 
compute  relative  means.   The  relative  mean  is 
the  mean  for  the  group  divided  by  the  mean  for 
all  households.   Thus,  the  relative  mean  income 
for  households  age  65  and  over  Is  $13,070/$21 ,260 
-  0.61.   Although  households  age  65  or  over  have 
a  relative  mean  Income  that  is  only  61  percent 
of  the  overall  mean,  their  relative  mean  net 
worth  is  27  percent  above  the  overall  mean. 

It  is  well-known  that  households  with  aged 
heads  are,  on  average,  smaller  than  households 
with  nonaged  heads.   These  differences  in 
household  sire  are  adjusted  for  in  this  paper 
using  an  equivalence  scale  based  on  the  U.S. 
poverty  lines  (U.S.  Bureau  of  the  Census  1981b, 
Table  A-3).  8/  A  one-person  household  is  taken 
as  the  base,  and  the  Income  and  net  worth  of 
each  household  are  divided  by  the  appropriate 
scale  value  to  obtain  amounts  adjusted  for  size 
of  household.  9/     The  relative  means  of  the 
35-54  age  groups,  which  have  relatively  large 
households,  show  declines  from  the  unadjusted 
estimates.   The  55-64  and  65  and  over  groups, 
which  have  relatively  small  households,  show 
increases.   The  relative  mean  income  for  the  65 
and  over  group  rises  to  0.79,  while  the  relative 
mean  net  worth  Increases  to  1.59.   In  general, 
the  relative  means  of  aged  households  rise  about 
25  percent. 
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There  is  substantial  dispersion  that  cannot 
be  seen  by  just  looking  at  the  averages.   In  the 
distribution  by  size  of  income  (after  adjustment 
for  size  of  household),  substantial  dispersion 
among  qulntlles  is  present  for  the  65  and  over 
age  group.   For  example,  even  though  32  percent 
of  households  are  in  the  bottom  income  quintile, 
11  percent  appear  in  the  top  quintile.   The 
distribution  among  net  worth  quintiles  also 
shows  substantial  dispersion,  with  24  percent  in 
the  bottom  two  quintiles  and  34  percent  in  the 
top  quintile. 

IV.  Characteristics  of  Aged  Households 

In  this  section  all  aged  households  and  aged 
households  with  both  relatively  low  income  and 
relatively  low  net  worth  are  discussed  and 
compared.  A  household  is  defined  to  have 
relatively  low  income  if  it  Is  in  the  bottom 
quintile  (20X)  in  the  overall  (all  ages)  dis- 
tribution of  households  by  size  of  income.   A 
household  is  defined  to  have  relatively  low  net 
worth  if  it  is  in  the  bottom  two  quintiles  (40%) 
in  the  overall  (all  ages)  distribution  of  house- 
holds by  size  of  net  worth.   In  both  cases  the 
amounts  were  adjusted  for  size  of  household.  10/ 
Households  that  have  both  low  income  and  low  net 
worth  using  these  definitions  will  be  called 
"LILr*T  (Low  Income  and  Low  Net  Worth) 
households. 

Because  the  estimates  have  been  adjusted 
using  a  Bcale  based  on  the  poverty  lines,  the 
dollar  amount  cutoffs  that  define  the  LILNW 
group  differ  by  household  size  and,  for  one-  and 
two-person  households,  by  age  of  householder. 
For  one-person  households  with  householder  age 
65  or  over,  the  upper  (annualized)  income  bound 
for  the  LILNW  group  is  $5,047,  while  the  upper 
net  worth  bound  is  $9,875.   For  two-person 
households  with  householder  age  65  or  over,  the 
income  bound  is  $6,369,  while  the  net  worth 
bound  is  $12,462.   The  bounds  are  about  nine 
percent  higher  if  the  householder  is  under  age 
65.  UJ 

Thirteen  percent  of  all  households  are  in  the 
LILNW  group.   The  percentages  range  from  9 
percent  for  households  age  55-64  to  21  percent 
for  households  under  25.   Fifteen  percent  of  all 
households  65  and  over  are  in  the  LILNW  group; 
the  percentage  is  11  in  the  65-69  group,  15  in 
the  70-74  group,  and  19  in  the  75  and  over 
group.  12/  Thus,  substantial  percentages  of  the 
aged  groups  have  both  relatively  low  income  and 
relatively  low  net  worth.   We  have  seen  that  24 
percent  of  households  age  65  and  over  are  In  the 
bottom  two  net  worth  quintiles,  while  32  percent 
are  in  the  bottom  Income  quintile.   Since  only 
15  percent  are  in  both  groups,  it  can  be  seen 
that  about  two  thirds  of  those  with  low  net 
worth  also  had  low  income  (15/24),  and  about  one 
half  of  those  with  low  income  also  had  low  net 
worth  (15/32). 

We  are  interested  here  in  who  the  aged  LILNW 
households  are  and  what  resources  they  have. 
Table  2  shows  the  composition  of  the  LILNW  group 
compared  to  all  households  for  the  65  and  over 
and  7  5  and  over  age  groups.   Looking  at  Bex  of 
householder,  46  percent  of  all  households  65  and 
over  have  a  female  householder,  but  75  percent 
of  the  LILNW  households  have  a  female  house- 


holder.  For  the  75  and  over  group,  86  percent 
of  the  LILNW  households  have  a  female  house- 
holder.  For  household  size,  the  LILNW  group 
shows  substantially  more  one-person  households 
than  the  population  as  a  whole  in  the  age  group 
(76  percent  compared  to  44  percent  in  the  65  and 
over  group).   The  LILNW  group  shows  substantial- 
ly fewer  married  spouse  present  householders  (11 
percent  compared  to  43  percent  in  the  65  and 
over  group)  and  more  widowed,  divorced,  and 
other  householders.   Looking  at  these  three 
characteristics  together,  households  of  size  one 
with  a  widowed  female  householder  comprise  49 
percent  of  the  LILNW  65  and  over  graup,  but  only 
27  percent  of  the  total  in  that  age  group.   The 
corresponding  percentages  for  the  75  and  over 
group  are  68  percent  in  the  LILNW  group  and  only 
37  percent  overall. 

Table  3  shows  the  composition  of  total 
income,  mean  amounts  of  income  types,  and  the 
percentage  of  households  with  various  types  of 
income  for  all  households  in  the  65  and  over  and 
75  and  over  groups  and  for  LILNW  households  in 
those  age  groups.   Looking  at  all  households  65 
and  over,  OASDI  accounts  for  about  one  third  of 
total  income,  while  earnings  and  property  income 
each  constitute  about  one  fourth,  and  pension 
income  accounts  for  about  one  tenth.  The  compo- 
sition of  Income  differs  markedly  between  LILNW 
and  all  households.   Looking  at  the  65  and  over 
group,  OASDI  benefits  are  far  more  important  in 
the  LILNW  group  (about  two  thirds  of  total 
Income),  although  the  mean  amount  of  benefits  in 
that  group  is  far  below  the  mean  for  all  house- 
holds.  SSI  is  also  more  important  in  the  LILNW 
group  (12  percent  compared  to  1  percent).   All 
other  income  sources  are  less  important  in  the 
LILNW  group,  with  earnings  (9  percent  compared 
to  25  percent)  and  property  income  (2  percent 
compared  to  23  percent)  showing  large  differ- 
ences.  The  patterns  are  similar  for  the  75  and 
over  group.   Mean  total  income  for  the  LILNW 
group  is  only  29  percent  of  the  mean  for  all 
households  in  the  65  and  over  group  and  32 
percent  in  the  75  and  over  group.   Because 
median  income  is  quite  close  to  mean  income  in 
the  LILNW  group,  but  not  for  all  households,  the 
ratio  of  the  LILNW  median  to  the  overall  median 
is  higher — 42  percent  for  the  65  and  over  group 
and  48  percent  for  the  75  and  over  group.   For 
both  mean  and  median,  there  is  less  difference 
between  the  65  and  over  and  75  and  over  groups 
for  the  LILNW  group  than  for  all  households. 
Panel  C  of  Table  3  shows  the  percentage  of  the 
group  that  received  each  type  of  income.   House- 
holds in  the  LILNW  group  receive  SSI  much  more 
often  (32  percent  compared  to  11  percent  for  the 
65  and  over  group),  and  earnings,  property 
income,  and  pensions  much  less  often  than  all 
households  in  the  age  group.  The  percentages 
receiving  OASDI  (93-94  percent  for  the  65  and 
over  group)  show  little  difference. 

Table  4  shows  the  composition  of  net  worth 
for  LILNW  households  and  all  households  in  the 
65  and  over  and  75  and  over  age  groups.  For  all 
households  65  and  over,  home  equity  and  finan- 
cial assets  each  account  for  about  one  third  of 
net  worth,  while  durable  goods  constitute  about 
one  tenth.  In  contrast,  in  the  LILNW  group  65 
and  over,  home  equity  is  about  one  fifth  of  net 
worth  and  financial  assets  are  about  one  third, 
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but  durable  goods  constitute  almost  three 
fifths.   The  L1LNW  mean  of  each  asset  is  far 
below  the  mean  for  all  households  in  the  age 
group.   The  L1LNW  group  shows  larger  shares  for 
liquid  financial  assets  and  durable  goods, 
negligible  shares  for  nonliquid  financial 
assets,  business  equity,  and  other  assets,  and  a 
lower  share  for  home  equity.   Mean  and  median 
amounts  of  net  worth  are  very  low  in  the  LILNW 
group,  even  below  the  income  amounts.   If 
durable  goods  are  excluded  from  the  mean  (on  the 
grounds  that  that  type  of  asset  cannot  easily  be 
liquidated  or  borrowed  against),  then  the  mean 
amount  is  $1,260  (including  home  equity). 

Panel  C  of  Table  4  shows  the  percentage  of 
households  with  each  type  of  asset  or  debt. 
Home  equity,  nonliquid  financial  assets, 
business  equity,  and  other  assets  are  held  by 
few  households  in  the  LILNW  group.   Durable 
goods  and  liquid  financial  assets  are  held  by 
high  percentages.   Fewer  households  in  the  LILNW 
group  have  unsecured  debt.   Differences  between 
the  65  and  over  and  75  and  over  groups  are 
relatively  small. 

V.   Summary  and  Conclusions 

We  have  seen  th&r,  on  the  average,  aged 
households  have  relatively  low  incomes  and 
relatively  high  net  worth.   However,  because 
there  is  substantial  dispersion  in  these 
amounts,  the  averages  do  not  tell  the  whole 
story.  For  example,  at  the  end  of  1979  about 
2  1/2  million  aged  households  had  both 
relatively  low  income  (mostly  OASDI  and  SSI)  and 
relatively  low  net  worth  (mostly  durable  goods). 
However,  there  also  are  aged  households  with 
substantial  amounts  of  both  Income  and  net 
worth.   Thus,  general  statements  about  how  well 
off  "the  aged"  are  can  frequently  be  misleading 
and  should  be  Interpreted  with  caution. 

FOOTNOTES 

*The  author  is  greatly  indebted  to  Sharon 
Johnson,  who  prepared  the  estimates,  and  to 
Benjamin  Bridges  and  Denton  Vaughan  for  their 
helpful  comments. 

1/  In  this  paper  we  were  not  able  to 

incorporate  the  data  on  taxes  and  noncash 
income  that  were  collected  in  the  ISDP. 

2/   See  U.S.  Bureau  of  the  Census  (1981a)  for 
the  CPS  definitions. 

3/  In  the  case  of  income-producing  assets  we 
include  both  the  Income  (in  income)  and  the 
asset  value  (in  net  worth). 

hi   Net  income  fom  nonfarm  self-employment  was 
not  obtained  in  Wave  5  of  the  ISDP.  The 
"draw"  is  the  amount  of  salary  or  money 
taken  out  of  the  business  for  living 
expenses. 

5/  Those  two  items  were  collected  in  Wave  2  of 
the  ISDP  for  most  households. 

6/  The  estimates  in  this  paper  differ  slightly 
from  those  in  Radner  and  Vaughan  (1984) 
because  these  estimates  reflect  our  approxi- 
mate correction  of  errors  that  we  believe 
exist  in  the  Wave  5  ISDP  public  use  file. 
Two  errors  relating  to  the  processing  of 
data  from  the  3-month-6-month  property 


income  recall  test  appear  to  have  been  made 
in  the  construction  of  household  summary 
amounts  of  property  income  from  amounts  of 
specific  income  types.   First,  all  of 
Rotation  3  was  treated  as  a  6-month 
reference  period  group,  whereas  only  one 
half  of  Rotation  3  had  actually  used  the 
6-month  reference  period.   Second,  for  all 
of  Rotation  3,  royalty  and  estate  and  trust 
Incomes  were  treated  as  though  the  6-month 
reference  period  had  been  used;  actually,  a 
3-month  reference  period  was  used  for  those 
income  types  for  all  households.   These  two 
errors  resulted  in  an  underestimate  of  about 
$3  1/2  billion  in  property  income  and  total 
income  on  the  household  records.   We  were 
able  to  make  only  an  approximate  correction 
of  these  errors  in  the  estimates  shown  in 
this  paper. 
7/  The  estimates  in  this  paper  exclude  a  few 

observations  with  negative  household  income. 
8/  The  choice  of  this  scale  was  arbitrary. 
However,  this  is  a  familiar  scale  that  is 
much  less  extreme  than  a  per  capita  adjust- 
ment.  The  extensive  literature  on  equiva- 
lence scales  contains  many  estimated  scales 
that  could  have  been  used.   In  some  cases, 
the  estimates  shown  in  this  paper  could  be 
sensitive  to  the  choice  of  a  scale. 
9/  The  scale  values  used  are:   1  person  (under 
65),  1.024;  1  person  (65+),  0.943;  2  persons 
(under  65),  1.322;  2  persons  (65+),  1.190;  3 
persons,  1.568;  4  persons,  2.009;  5  persons, 
2.379;  6  persons,  2.687;  7  persons  or  more, 
3.329. 

10/  It  should  be  noted  that  these  definitions 
are  somewhat  arbitrary.   The  bottom  two 
quintiles  were  used  instead  of  the  bottom 
quintile  for  net  worth  because  the  implied 
net  worth  amounts  in  the  bottom  quintile 
were  extremely  small.   Looking  at  these  low 
income  and  low  net  worth  households  does  not 
require  a  completely  specified  ranking  of 
households  according  to  economic  well-being. 
Also,  we  do  not  claim  that  all  low  income 
and  low  net  worth  households  are  worse  off 
than  all  other  households. 

11/  The  bounds  for  other  household  sizes  can  be 
derived  using  the  scale  values  shown  in 
footnote  9.   The  Income  bounds  are  about  45 
percent  above  the  1979  poverty  lines. 

12/  Using  estimates  unadjusted  for  household 
size,  20  percent  of  aged  households  were  in 
the  LILNW  group,  compared  to  13  percent  for 
all  ages  (Radner  and  Vaughan  1984,  Table 
16.)  For  comparison,  in  1979,  about  12 
percent  of  all  households  and  18  percent  of 
aged  households  were  poor  (U.S.  Bureau  of 
the  Census  1982,  Table  22). 
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Table  1. — Mean  and  Median  Income  and  Net  Worth,  by  Age  of  Householder,  1979 


Weighted 

Annu 

all zed 

Number  of 

Income  ($) 

Net  Worth 

Sample 

Households 

Age  of  Householder 

Cases 

(Thousands) 

Mean 

Median 

Mean 

Median 

All  Ages 

6,922 

82,211 

21,260 

16,500 

62,430 

25,770 

Under  25 

621 

6,613 

14,150 

12,860 

8,880 

3,450 

25-34 

1,441 

19,272 

20,160 

18,000 

24,520 

12,110 

35-44 

1,089 

14,014 

25,390 

20,430 

64,950 

29,820 

45-54 

1,049 

12,975 

28,400 

24,180 

79,120 

39,180 

55-64 

1,155 

12,722 

25,500 

19,330 

105,740 

51,440 

65  and  over 

1,567 

16,614 

13,070 

8,630 

79,380 

38,720 

65-69 

489 

5,570 

15,560 

9,750 

90,040 

47,300 

70-74 

411 

4,818 

12,790 

9,040 

85,480 

45,440 

75  and  over 

667 

6,226 

11,050 

7,460 

65,130 

28,410 

Dollar  amounts  are  rounded  to  the  nearest  $10. 


Table  2. — Percentage  Composition  of  Aged  Households,  1979 


Type  of  Household 


Low  Income 

and  Low 
Net  Worth 


Low   Income 

and   Low 
Net   Worth 


Sex  of  Householder 
Female 
Male 

Household  Size 

1  Person 

2  Persons  or  more 

Marital  Status  of 
Householder 
Married,    Spouse  Present 
Widowed 
Divorced 
Other 


Household  Size    1, 
Householder  Widowed 
Female  27 

Sample  Cases  1,567 

Weighted  Number  of  16,614 

Households   (thousands) 


667 
6,226 


j'g 

«■  a 

5  £ 


■  8  8S 

s  j  o 


!i  si 


o-H^oi{N-<ooaN-H     oooooooooo 


r\c*-»v£>*on©©ON-< 


OOOP>4«I<1  -• 


W   O   00   00   ->   - 


3000000000  o  ooo 

inoo-ivO-to-»-<o  ->  xoo" 

■nn^»OKio^*<>»  -»  o  ...  -, 

imai»sHnoifl  oo  uj 


40ic»  o<»       <n       uOjiOa^f 


•O    <r  u    O  .A 

3      Sua 


m                3    »T)   u  <n 

3    oo  -a    cr  o    m  3    <»  T3    cr  oo    u>                    *■>  3    ID   cr  ■   I 

0--H-H-H(D<«IT3  o--H-HtH00<»JT3"4»  cr-rt-H-H0><4l-O 

(dU3'H«i       -h  oj  a  u   3h  »       hu  ez  uo3~4«       <h  « 

£             Bcrda^*i-  £             CotCc^jspS  -c             e   cr  c   e  u  .o  n 

u^tlcd-HO-HOIesp  u£tl«HOT4tl«3'SC  «->.cqj«-HO-H0J'C3 

W,u0C.3sEb>.C|-.G  U4J8CjZcoXt,Cc«  buBB^JzaiXkiU 

O'HO'rt                   3    4J     3     0)  O^O-H                   34J30ia»-H  O   ^    5   -H                   3    U     3    II 

3  «xtx           ®  o  a   to  3  txh           A  o  a  n  -o  -a  3   a  x  u,           psoq  a 

2  £  £ 


So> 


SS.    5 

2£ 


I! 


c3 

O    60 


5£ 

«  e 


Ut 


JSJ 


onNinonN-ivo  ooooooooo      o       ouo- 


o  o  f^  vo  o 


£ 


^OOOOOOOOO        O  O  -«  O  r*.  r 

u->  c*  o  —  n0O44  4        »o        sONOOff.- 


JOOOOOOOOO         O         OOi/ 
Ju-iri\D-»irir>»n-*o        en        Jd  °  " 

H 
U 

OS 


|ft        .13  55        .13     l! 


h 1 |§m1ISj 

uu(uoia£  C 


30 


USING  SUBJECTIVE  ASSESSMENTS  OF  INCOME  TO  ESTIMATE 

FAMILY  EQUIVALENCE  SCALES:  A  REPORT  ON  WORK  IN  PROGRESS* 

Denton  R.  Vaughan 

Social  Security  Administration 


This  paper  will  present  a  short  discussion  of  the 
estimation  of  family  equivalence  scales  based  on 
Individuals'  assessments  of  their  own  current  Income 
level  or  their  subjective  judgments  about  the  income 
required  to  provide  specified  levels  of  satisfaction 
or  to  sustain  a  given  standard  of  living. l  A  scale 
estimated  on  the  basis  of  Individuals'  subjective 
assessments  of  their  current  income  is  presented  and 
the  results  are  discussed  in  terms  of  scales 
obtained  by  "traditional"  methods  and  other 
"subjective-based"  scales. 

Techniques  for  establishing  equivalence. 

Traditional  approaches. — A  number  of  techniques 
have  been  used  to  estimate  family  size  equivalence 
scales.  Until  recently,  most  approaches  have  relied 
either  on  some  manifestation  of  consumer  behavior 
that  is  interpreted  as  a  welfare  indicator  or  on 
scientific  or  quasi-scientific  dietary  standards. 

The  general  class  of  Engel  curve  analyses  falls 
into  the  first  category  and  the  most  frequently  used 
welfare  indicator  has  been  the  percentage  of  income 
or  expenditures  devoted  to  food  as  affected  by 
family  size.  As  food  has  become  a  less  dominant 
element  of  family  budgets  in  Western  Europe  and  the 
United  States,  and  therefore  less  representative  of 
the  overall  material  standard  of  living  of  the 
general  population,  Engel  curve  analyses  have  been 
extended  to  broader  sets  of  consumption  goods  such 
as  necessities  (e.g.,  Sydenstriker  and  King  1921; 
Prais  and  Houthakker  1955;  Watts  1967;  Seneca  and 
Taussig  1971).  Other  Engel  curve  applications  have 
made  use  of  adult  consumption  goods  such  as  alcohol 
and  tobacco  (Nicholson  1949,  1976)  or  have  analyzed 
variation  in  the  percentage  of  income  devoted  to 
savings  (U.S.  Department  of  Labor  1948). 

More  recently,  economists  have  attempted  to  define 
equivalence  on  the  basis  of  demand  equations 
estimated  for  the  full  set  of  consumption  goods.  In 
this  approach,  equivalence  estimates  are  derived 
from  variations  in  demand  that  are  associated  with 
family  size  and  composition  differences  (Kakwani 
1977;  van  der  Gaag  and  Smolensky  1982;  Danziger,  van 
der  Gaag,  Smolensky  and  Taussig  1984). 

Equivalence  scales  based  on  dietary  adequacy  take 
as  a  point  of  departure  a  recognized  nutritional 
standard,  such  as  provided  by  the  U.S.  National 
Research  Council  (NRC),  and  translate  the  standard 
into  a  per  capita  food  plan  appropriate  to  a  given 
level  of  living.  Since  nutritional  standards  such 
as  the  one  developed  by  the  NRC  incorporate 
variations  by  age  and  sex,  their  translation  into 
food  plans  necessarily  captures  family  composition 
effects  (Peterkin  1976).  By  taking  into  account  the 
interaction  between  per  capita  income,  per  capita 
food  plan  costs  and  the  nutritional  quality  of  diet 
(by  family  size),  such  dietary  equivalence  scales 
are  said  to  capture  economies  of  scale  as  well  as 
compositional  effects  (e.g.  Kerr  and  Peterkin  1976, 
p.  72-73).  The  equivalence  scale  incorporated  in 
the  official  Federal  poverty  line  is  based  in  large 
part  on  an  earlier  version  of  the  U.S.  Department  of 
Agriculture  scale  described  by  Peterkin.  However, 
ad  hoc  adjustments  were  made  to  the  ratios  for 
families  of  one  and  two  to  account  for  the  supposed 
higher  fixed  costs  of  operating  small  households 
(Orshansky  1965,  p.  9). 


Direct  subjective  measures.  Beginning  in  the 
early  1970's,  the  economist  van  Praag  and  his 
colleagues  at  Leyden  University  in  the  Netherlands 
(van  Praag  1971;  van  Praag  and  Kapteyn  1973;  Kapteyn 
and  van  Praag  1976;  Goedhart ,  Halberstadt,  Kapteyn 
and  van  Praag  1977;  and  van  Herwaarden,  Kapteyn  and 
van  Praag  1977)  and  sociologists  in  the  United 
States  (Rainwater  1973  and  1974;  Vaughan  and 
Lancaster  1979;  Dubnoff,  Vaughan  and  Lancaster  1981) 
initiated  the  use  of  individuals'  perceived  income 
needs  to  estimate  family  equivalence  scales.  By  the 
early  1980 's  a  few  additional  U.S.  social  scientists 
had  begun  to  exploit  the  "subjective"  approach  using 
newly  available  U.S.  data  bases  (Colasanto,  Kapteyn 
and  van  der  Gaag  1984;  Danziger,  van  der  Gaag, 
Taussig  and  Smolensky  1984)  in'  conjunction  with 
Dutch  economists  then  resident  in  the  United  States. 

Actually,  from  a  theoretical  perspective  the  use 
of  the  term  "subjective"  to  differentiate  this  class 
of  measures  from  more  orthodox  approaches  based  on 
consumer  behavior,  and  most  particularly  consumption 
choices,  is  somewhat  misleading.  In  economics, 
behavior  (consumer  choice)  is  generally  considered 
to  be  revelatory  of  subjective  states,  i.e., 
preferences  are  "revealed"  by  behavior,  and  much  of 
the  theorizing  underlying  the  study  of  consumer 
behavior  and  welfare  economics  relies  on  primitive 
notions  of  subjective  states.  In  the  context  of 
constraints,  it  is  these  states  which  essentially 
motivate,  if  not  determine,  behavior.  So  in  an 
important  sense  what  is  novel  about  using  subjective 
assessments  to  make  inferences  about  preferences, 
utility,  or  welfare  is  not  that  in  being  subjective 
they  are  somehow  set  apart  from  the  concerns  of 
economists.  Rather  it  is  that  the  orthodox 
economist  is  more  familiar,  and  therefore  more 
comfortable,  with  employing  behavior  as  an  indicator 
of  the  underlying  construct  of  welfare  than  with 
using  the  direct  verbal  representations  of  those 
constructs  or  states  (Muellbauer  1980,  p.  153; 
Pollak  and  Wales  1979,  p.  219).  Weighing  in  on  the 
other  side  of  the  issue,  however,  Sen  (1982,  pp.  71- 
72)  has  noted  that  the  "idea  that  behavior  is  the 
one  real  source  of  information  [about  preferences] 
is  extremely  limiting  for  empirical  work  and  is  not 
easy  to  justify  in  terms  of  the  methodological 
requirements  of  our  discipline  [viz.  economics]." 

In  any  case,  in  the  present  context  I  will  use  the 
phrase  direct  subjective  measures  to  at  once 
highlight  the  theoretical  relationship  between  this 
class  of  measures  and  those  employed  within  a 
"revealed  preference"  framework  while  correctly 
stressing  that  they  are  based  on  consumers' 
perceptions  or  subjective  assessments  rather  than  on 
consumer  behavior. 

Types  of  direct  subjective  measures. 

There  are  three  basic  types  of  direct  subjective 
measures  that  have  been  used  to  estimate  family 
equivalence  scales.   They  are: 

1)  The  income-subjective  welfare  metric. — This 
approach  was  pioneered  and  systematically  exploited 
by  van  Praag  and  his  associates  (van  Praag  1968, 
1971;  van  Praag  and  Kapteyn  1973).  but  was  also 
briefly  experimented  with  by  Andrews  and  Withey 
(1976,  pp.  228-229,   378).   Respondents  are  asked  a 
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series  of  questions  which  permit  estimation  of  the 
function  which  relates  income  to  the  welfare  or 
utility  derived  from  income  as  perceived  by  the 
individual.  The  parameters  of  the  estimated 
function  may  then  be  used  in  a  variety  of 
applications,  such  as  the  construction  of  family 
size  equivalence  scales. 

2)  The  Income-living  level  approach.  This 
technique  relies  on  respondents'  estimates  of  the 
money  income  amounts  required  to  attain  a  given 
level  of  living  (getting  along,  living  comfortably, 
etc).  Systematic  variation  in  the  subjective 
income  requirements  associated  with  the  various 
levels  of  living  is  used  to  model  differences  in 
underlying  needs  and  to  construct  equivalence 
scales. 

Rainwater  (1974)  was  the  first  to  obtain  estimates 
for  a  range  of  alternative  levels  simultaneously. 
More  recently  the  Expert  Committee  on  Family  Budget 
Revisions,  convened  by  the  Bureau  of  Labor 
Statistics,  recommended  a  program  of  research  to 
explore  the  feasibility  of  estimating  a  set  of 
living  level  standards  at  the  national  level  (Watts 
1980).  Research  funded  independently  by  the  U.S. 
National  Science  Foundation  (see  Dubnoff  and  Strate 
1984)  is  pursuing  these  and  related  issues. 

3)  Income  rating  scales.  This  method  relies  on 
obtaining  the  respondent's  rating  of  his  or  her 
current  income  using  a  set  of  ranked  categories 
(such  as  from  "delighted"  to  "terrible")  believed  to 
extend  over  most,  if  not  all,  of  the  continuum  of 
satisfaction  or  welfare.  The  sensitivity  of  such 
responses  to  family  size  was  first  demonstrated  in 
1979  (Vaughan  and  Lancaster)  and  the  first 
equivalence  scales  per  se  were  estimated  using 
current  income  rating  measures  in  1981  (Dubnoff, 
Vaughan  and  Lancaster).  This  class  of  measures  may 
be  thought  of  as  a  subset  of  the  income/welfare 
metric  approach  (Vaughan  and  Lancaster  1980). 


Estimation  of  equivalence  scales  using  Income  rating 
measures:  An  example. 

This  section  of  the  paper  reports  on  the 
preliminary  stages  of  a  project  undertaken  to 
evaluate  the  usefulness  of  subjective  assessments 
for  comparing  differences  in  the  level  of  economic 
welfare  experienced  by  significant  subgroups  of  the 
population.  Developing  equivalence  estimates  consti- 
tutes a  major  segment  of  this  project.  A  number  of 
the  questions  which  will  be  addressed  in  the  course 
of  this  work  cannot  be  dealt  with  using  the  data  set 
currently  available,  so  the  purpose  of  the  current 
exercise  is  to  give  the  reader  the  flavor  of  how  one 
might  address  such  issues  on  the  basis  of  income 
rating  scales. 

The  scale  estimation  procedure.  The  general 
approach  is  similar  to  that  first  employed  by 
Dubnoff,  Lancaster  and  the  current  author  in  an 
earlier  paper  (Dubnoff,  Vaughan  and  Lancaster  1981). 
Essentially,  a  measure  of  satisfaction  with  current 
income  is  regressed  on  income,  family  size,  and  a 
set  of  additional  variables  considered  to  be 
significant.  The  estimating  equation  is  of  the  form: 


■■   3o  +  0tlnFS  +  S,lnY  • 


Sy  -  satisfaction  with  income,  recoded  to 
represent  a  zero  to  one  welfare  con- 
tinuum, 

FS  =  family  size, 


Y  »  monthly  family  income  gross  of  tax, 
C  ■  a  set  of  (k  -  2)  control  variables, 

0O  thru  0*  are  the  regression  coefficients  and  €  is 
the  disturbance  term  assumed  to  have  the  required 
characteristics.  The  control  variables  in  the 
present  context  are  entered  as  dummy  variables  and 
include,  in  this  initial  specification  (note 
variable  abbreviations  in  parentheses):2 

*  Financial  situation  in  1979  judged  better  than 
in  1978  (TB), 

*  Financial  situation  in  1979  judged  worse  than 
in  1978  (TW), 

*  Financial  situation  in  1980  expected  to  be 
better  than  in  1979  (NB), 

*  Financial  situation  in  1980  expected  to  be 
worse  than  In  1979  (NW) ,  and 

*  Family  reference  person  is  age  65  or  over 
(AGE). 

Depending  on  theoretical  considerations  and  the 
reasonableness  of  the  results  these  variables  may  be 
directly  Included  in  the  scale,  for  example, 
differentiating  the  scale  by  age  of  head,  or  may  be 
used  to  control  for  effects  that  one  does  not  want 
the  scale  to  reflect.  More  information  about  the 
satisfaction  measure  and  the  income  and  the  control 
variables  is  available  from  the  author  on  request.1 

Estimation  results.  Based  on  data  from  the  fifth 
and  sixth  waves  of  the  1979  Research  Panel  of  the 
Income  Survey  Development  Program  (ISDP)  (see  Yeas 
and  Lininger  1981),  (1)  is  estimated  as 


0.1139TW  -  0.0276NB  -   0.0508NW  +   0.0763AGE 
(21.01)     (-4.23)      (-9.53)      (11.83) 

(R*  =  .263.  N  =  5,067) 


This  rather  simple  model  explains  a  little  more 
than  25  percent  of  the  variation  in  satisfaction 
with  income.  All  variables  in  the  model  are 
statistically  significant  at  conventional  levels  (t- 
values  are  in  parentheses).  As  expected,  income  is 
positively  related  to  satisfaction  with  income  and 
the  effect  of  family  size  is  negative;  holding 
income  constant,  satisfaction  with  income  decreases 
as  family  size  increases.  The  age  coefficient  has 
the  expected  sign  and  indicates  that,  controlling 
for  the  other  variables  in  the  model,  the  aged 
require  less  income  to  reach  a  given  level  of 
satisfaction  than  the  non-aged.  However,  as  we 
shall  see,  the  magnitude  of  the  age  effect  is  much 
larger  than  seems  reasonable. 

Other  things  being  equal,  individuals  who  felt 
that  their  financial  situation  in  1979  was  better 
than  1978  are  more  satisfied  than  others,  while 
those  who  felt  that  their  situation  in  1979  was 
worse  than  in  1978  derive  a  great  deal  less 
satisfaction  from  a  given  level  of  income.  Those 
who  expected  their  financial  situation  to  change  in 
1980,  whether  for  better  or  worse,  are  considerably 
less  satisfied  than  others  at  similar  income  levels. 
In  fact,  respondents 'subjective  assessments  of 
financial  change  (retrospective  for  1979  compared  to 
1978  and  prospective  for  1980  compared  to  1979)  are 
more  important  in  accounting  for  variation  in 
satisfaction  with  income  than  current   income  and 


family  size. > 

Deriving  the  scales.  The  purpose  of  an  equivalence 
scale  is  to  establish  the  relative  incomes  necessary 
to  provide  equal  "satisfaction",  "welfare",  or 
"utility"  for  fa«ilies  in  differing  situations.  In 
the  present  context  we  wish  to  construct  a  scale 
taking  into  account  family  size  and  age  of  head  and 
will  use  non-aged  families  of  size  four  as  the 
reference  family. 

Restricting  ourselves  for  the  moment  to  simply 
family  size  effects  among  the  non-aged,  we  need  to 
know  the  ratios  of  income 


variable  (the  exponentiated  product  of  the  variable 
and  its  coefficient  divided  by  the  coefficient  of 
income) . 

The  equivalence  scales  for  non-aged  and  aged 
families,  taken  directly  from  equations  (6)  and  (7), 
are  shown  below.  Note  that  the  values  for  the  aged 
are  too  low  to  be  given  much  credence.  This  point 
will  be  elaborated  on  further  in  the  later  sections 
of  the  paper. 


Age  of  reference  per; 


Under  65     65  or  over 


such  that  satisfaction  with  income  (Sy)  is  the  same 
for  families  of  size  one  to  three  and  five  or  more 
as  for  families  of  size  four,  that  is: 


Using  equation  (1),  less  the  term  representing  the 
set  of  control  variables  which  will  be  reintroduced 
below  (see  (7)), 


■■  |80    +  P.lnFSj    +  3BlnYj 


One 61 

Two 78 

Three 90 

Four 100 

Five 108 

Six 116 

Seven 122 

Eight 128 

Nine 134 

Nine  or  more4  143 

(  — )  -  Not  calculated. 

'Based  on  an  average  family  size  of  10.9 
for  the  nine  and  over  group. 


Combining  (3)  and  (4)  as 
simplifying  and  arranging  the 
family  size  on  opposing  sides, 


suggested  by  (2), 
terms  for  income  and 
and  dividing  through 


,(lnFSt  -  InFSj) 


Finally,  taking  the  antilog  of  each  side  gives 


or  the  ratios  of  income  necessary  to  leave  families 
of  size  "j"  as  satisfied  with  their  incomes  as  are 
families  of  size  four. 

Introduction   of   an   additional   variable  or 
variables  can  be  accomplished  by  inserting  the  terms 

03C,   +   +   3kCn.,   from   (1)   in   (3).    The 

result,  after  the  simplification  and  rearrangement 
represented  by  steps  (5)  and  (6),  is 


,.(4-) 


•ra 


the  equivalence  ratio  net  of  the  effect  of  a  given 
control  variable(s)   divided  by  the  effect  of  that 


Results  for  the  non-aged  scale  compared  with  other 
approaches. 

Studies  using  direct  subjective  measures.--  Eight 
scales  estimated  using  direct  subjective  measures 
are  shown  in  table  1,  together  with  the  scale  first 
presented  in  this  paper.  Six  different  data  sets 
are  involved,  and  all  three  types  of  direct 
subjective  measures  are  represented:  living  level 
measures  (6  scales);  income  metric  measures  (1 
scale),  and  current  income  rating  measures  (2 
scales) . * 

Of  course  in  some  sense  the  scales  are  not  fully 
comparable.  Important  differences  in  study  design 
and  implementation  cannot  be  controlled  for.  Thus, 
it  would  be  unrealistic  to  expect  complete 
uniformity  In  the  various  scales.  However,  there 
are  Important  commonalities  among  the  studies.  For 
example,  all  scales  but  [7],  for  which  age  effects 
are  controlled,  pertain  specifically  to  units  headed 
by  an  Individual  under  age  65,  and  all  but  [8], 
which  employed  family-size  dummies,  used  a  natural 
log  specification  for  unit  size. 

The  key  attribute  of  these  scales  is  their 
steepness.  The  formal  way  to  assess  their 
"steepness"  is  to  compare  their  income-unit  size 
elasticities,  that  is,  the  percentage  change  in 
income  required  to  compensate  for  a  given  percentage 
difference  in  family  size.8  The  elasticities  for  all 
nine  of  the  scales  are  given  at  the  bottom  of  table 
1.  Despite  all  the  differences  among  the  studies, 
there  does  seem  to  be  considerable  uniformity  among 
the  scales,  as  the  elasticities  for  eight  of  the 
nine  range  in  a  relatively  narrow  band  of  .31  to 
.39.  The  elasticity  for  the  "steepest"  scale  is 
.47,  indicating  that  it  is  on  the  order  of  20  -  50 
percent  "steeper"  than  the  others.8 

To  give  the  flavor  of  what  these  income-unit  size 
elasticities  mean  in  practical  terms,  the  eight 
"flatter"  scales  indicate  that  roughly  25  -  30 
percent  more  income  is  required  to  maintain 
equivalent   well-being  as   family  size  doubles  (as 
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from  2  to  4  or  4  to  8  persons).  For  the  BNS  minimum 
income  scale,  about  forty  percent  more  income  is 
required  to  yield  an  equivalent  level  of  welfare 
across  such  family  size  differences. 

In  sun,  this  preliminary  version  of  an  ISDP  income 
satisfaction  scale  appears  to  be  quite  similar  to 
other  "subjective"  scales  estimated  to  date  for  the 
U.S.7  There  also  appears  to  be  at  least  some 
regularity  amomg  the  scales  estimated  so  far  on  the 
basis  of  direct  subjective  assessments.  How  do 
these  "subjective"  scales  compare  to  those  derived 
using  traditional  approaches? 

Comparisons  with  scales  based  on  dietary  need  and 
consumer  behavior. —  Seven  different  "objective" 
scales  are  shown  in  table  2.  Three  widely  used 
scales  are  shown  in  the  first  three  columns  of  the 
table.  The  Thrifty  Food  Plan,  developed  by  the  USDA 
(col.  2),  is  a  dietary  adequacy  scale  and  represents 
the  ratio  of  food  plan  costs  necessary  to  maintain 
equal  nutritional  adequacy  by  family  size  (Kerr  and 
Peterkin  1976;  Peterkin  1976).  The  Federal  Poverty 
Line  scale  (col.  1)  is  largely  based  on  an  earlier 
version  of  the  USDA  Thrifty  Food  Plan,  the  Economy 
Food  Plan  (Orshansky  1965).  As  reflected  in  the 
poverty  line  scale  values  for  units  size  five  and 
above,  it  was  slightly  less  steep  than  the  Thrifty 
Plan.  However,  the  most  pronounced  differences 
between  the  food  plan  scale  and  the  poverty  scale 
are  for  two,  and  more  especially  one-person,  units. 
As  noted  earlier  these  differences  stem  from 
intuitive  adjustments  Bade  to  the  poverty  scale  in 
order  to  account  for  supposed  "diseconomies" 
experienced  by  small  units  in  the  consumption  of 
nonfood  items.  The  BLS  scale  (col.  3)  was  derived 
by  estimating  the  total  expenditure  levels  at  which 
the  percent   of  total   expenditures  devoted   to  food 


remained  constant  by  family  size,  i.e.  it  is  a 
classic  Engel  type  scale  (Jackson  1968).  Interest- 
ingly, it  is  very  similar  to  the  dietary  adequacy 
scale  up  through  families  of  size  five. 

The  ISO-prop  scales  shown  in  columns  4  and  5  are 
also  Engel  scales,  i.e.,  they  were  estimated  by 
holding  constant  expenditures  for  food  or  neces- 
sities as  a  percentage  of  total  consumption 
expenditures  for  families  of  different  size  (Watts 
1967).  The  Iso-prop  food  scale  is  quite  similar  to 
the  Thrifty  Food  Plan  and  BLS  Family  Budget  scales. 
The  necessities  scale  differs  from  the  other  four 
scales  in  that  it  is  noticeably  less  steep. 

The  food-based  scales  indicate  that  a  family  of 
eight  would  need  about  seventy  percent  more  income 
than  a  four-person  family  to  be  equally  well-off. 
The  broader  based  necessities  scale  suggests  that 
the  family  of  eight  would  need  only  about  forty 
percent  more.  At  family  sizes  below  four,  of 
course,  the  necessities  scale  is  appreciably  more 
compressed  than  the  three  food  based  scales. 
Because  of  the  ad  hoc  adjustments  incorporated  into 
the  poverty  line  scale  for  units  size  one  and  two, 
at  the  lower  end  the  Iso-prop  necessities  scale  and 
the  poverty  scale  are  roughly  similar  to  each  other, 
suggesting,  perhaps,  the  essential  reasonableness  of 
those  adjustments. 

The  final  two  scales,  estimated  using  the 
extended  linear  expenditure  system  (ELES)  on  the 
basis  of  all  categories  of  current  consumption 
expenditures,  are  flatter  even  than  the  iso-prop 
necessities  scale  (van  der  Gaag  and  Smolensky  1982, 
Danziger,  van  der  Gaag,  Taussig  and  Smolensky  1984). 
The  contrast  between  the  ELES  scales  and  the  food 
based  scales  is,  quite  naturally,  even  more  marked. 

The  comparison,  then,  of  these  different   types  of 


-Selected  non-aged  equivalence  scales  estimated  using  dietary  standards, 
food  share,  or  more  broadly  based  definitions  of  consumption 
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'Scale  is  for  poverty  level  incomes  and  is  net  of  age  effects  rather  than 

for  the  non-aged  per  se. 
bWeighted  using  ISDP  population  counts. 
cSix  or  more. 
dNot  available. 
"See  note  (c)  table  1 . 
'Estimated  using  scale  values  for  units  of  size  1-5  only. 
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objectively  based  scales  suggests  that  as  one  moves 
away  from  scales  based  on  food  (whether  nominally 
grounded  in  dietary  adequacy  or  an  Engel  technique) 
to  scales  based  on  more  inclusive  categories  of 
consumption,  family  equivalence  scales  tend  to 
flatten  considerably;  perhaps  to  a  degree, 
represented  by  the  ELES  scales,  that  some  observers 
would  find  hard  to  accept  (Deaton  and  Muellbauer 
1980,  pp.  201-206).  Nonetheless,  the  pattern 
presented  is  rather  marked  and  unambiguous. 

Turning  now  to  a  comparison  of  the  direct 
subjective  scales  to  this  set  of  "objective" 
scales,  it  is  immediately  obvious  that  the 
subjective  scales  bear  little  resemblance  to  the 
food-based  scales  shown  in  table  2.  In  fact,  with 
the  exception  of  the  BNS  minimum  income  scale,  the 
subjective  scales  look  quite  similar  to  the  ELES 
scales,  the  flattest  in  the  objective  category.  The 
BNS  scale,  on  the  other  hand,  is  virtually  indis- 
tinguishable from  the  Iso-prop  necessities  scale. 
However,  since  there  doesn't  seem  to  be  any  readily 
identifiable  substantive  basis  for  its  greater 
"steepness",  this  doesn't  seem  to  be  especially 
noteworthy. • 

Equivalence  and  the  aged 

One  of  the  main  issues  which  we  will  be  addressing 
in  this  project  concerns  the  economic  needs  of  the 
aged  both  in  comparison  to  the  non-aged  and  among 
major  subgroups  of  the  aged  population  itself. 

Table  3  contains  information  from  ten  different 
scales  which  distinguish  units  with  heads  age  65  or 
older.  Four  of  the  scales  were  derived  using 
objective  measures  (columns  1-4)  and  six,  including 
estimates  from  the  sixth  wave  ISDP  income 
satisfaction  item,  are  based  on  direct  subjective 
measures  (columns  5-10). 

There  are  three  important  questions  in  the  area  of 
aged  equivalence  which  are  addressed  by  information 
in  the  table:  1)  are  the  needs  of  the  aged 
systematically  different  from  those  of  the  non-aged? 
2)  How  do  the  needs  of  aged  one  and  two-person  units 
compare?  3)  when  living  alone,  i.e.,  in  single- 
person  units,  are  the  needs  of  aged  men  and  women 
different? 

The  table  is  divided  into  two  parts.  The  first 
three  rows  of  the  table  deal  with  the  relative  needs 
of  one  and  two-person  units  and  males  and  females 
indexed  to  the  needs  of  a  non-aged  unit  of  size  2  = 
100.  Rows  2  and  3  show  the  equivalence  ratios  for 
males  and  females  living  alone  expressed  as  a 
percentage  of  a  non-aged  two-person  unit.  Row  1 
gives  the  average  for  single-person  aged  units 
weighted  on  the  basis  of  the  numbers  of  aged  men 
and  women  living  alone  in  the  noninstitutional 
population.  The  information  in  rows  one  to  three  is 
then  re-expressed  in  part  II  of  the  table  (the  last 
three  rows)  using  aged  units  of  size  two  =  100  as  a 
base.  Finally,  since  the  base  for  the  percentages 
in  part  one  of  the  table  is  non-aged  units  of  size 
two  =  100,  the  row  for  units  of  size  two  (row  4) 
isolates  the  overall  "age"  effect.* 

Turning  to  the  first  issue,  what  do  these  scales 
imply  about  the  needs  of  the  aged  compared  to  the 
non-aged?  It  Is  often  asserted  that  the  aged 
require  less  than  the  non-aged  to  sustain  a 
comparable  level  of  living  (see,  for  example 
Danziger,  van  der  Gaag,  Smolensky  and  Taussig  1984). 
They  are  able  to  maintain  equivalent  nutritional 
status  at  lower  levels  of  food  consumption,  they 
often  live  in  fully  paid  for  owner-occupied  housing, 
no  longer  experience  the  expenses  associated  with 


working  and,  if  comparisons  are  made  on  the  basis  of 
before-tax  income,  generally  a  smaller  portion  of 
the  income  of  the  aged  is  devoted  to  taxes.  On  the 
other  hand,  they  may  experience  greater  out-of- 
pocket  health  expenses,  and  concerns  about  economic 
security  could  produce  a  shift  in  the  focus  of 
economic  preoccupations  from  consumption  to  savings, 
what  do  these  ten  equivalence  scales  Indicate 
regarding  the  relative  needs  of  the  aged  and  non- 
aged? 

Indeed,  all  but  the  BNS  minimum  income  scale 
indicate  that  controlling  for  family  size,  the  aged 
require  less  than  the  non-aged  to  attain  an 
equivalent  level  of  economic  well-being.  Since  the 
standard  error  for  the  age  coefficient  underlying 
the  BNS  scale  is  virtually  as  large  as  the 
coefficient  (Colasanto,  Kapteyn  and  van  der  Gaag 
1984,  p.  134),  no  substantive  importance  need  be 
attached  to  its  failure  to  conform  to  the  general 
picture  of  relatively  lower  material  needs  for  aged 
families.  Apart  from  this  rather  commonplace  finding 
that  the  aged's  "needs"  are  less  than  those  of  the 
non-aged,  the  diversity  in  the  estimates  is  rather 
striking.  It  is  probably  noteworthy  that  the  two 
"food-based"  measures  yield  the  highest  aged/non- 
aged  ratios  (about  90).  The  two  variants  of  the 
ELES  scale,  while  lower  than  the  food  based  scales, 
differ  somewhat  from  each  other.  The  version 
denoted  as  ELES  I  (van  der  Gaag  and  Smolensky  1982) 
indicates  the  aged  require  only  73  percent  as  much 
as  the  non-aged  to  be  equally  well-off,  while  ELES 
II  (Danziger,  van  der  Gaag,  Smolensky  and  Taussig 
1984)  indicates  that  the  aged's  needs  are  84  percent 
of  those  of  the  non-aged. 

With  the  exception  of  the  BNS  scale,  which  can  be 
discounted  because  of  the  unreliability  of  the  age 
dimension  of  the  scale,  the  direct  subjective  scales 
are  all  at  or  below  75  for  the  aged.  Three  of  the 
five  are  in  the  64  to  75  range,  while  the  other  two, 
the  1972  SRC  and  1979  ISDP  income  satisfaction 
scales,  are  considerably  lower  (50  and  27 
respectively).  In  fact,  the  ISDP  income  satisfac- 
tion scale  is  too  low  to  be  taken  seriously. 

Thus  it  would  seem  that  the  wide  range  presented 
by  the  estimates  precludes  any  useful  general 
statement  about  the  relative  needs  of  the  aged  and 
non-aged  based  on  these  studies.  Apparently  a  good 
deal  more  care  is  needed  to  lay  out  the  nature  of 
the  problem,  to  insure  that  variables  are  defined  in 
a  theoretically  relevant  manner  and  to  take  care 
that  the  models  are  correctly  specified. 

The  two  remaining  issues  addressed  in  table  3,  the 
possible  impact  of  gender  on  needs  and  the 
comparative  needs  of  one  and  two-person  aged  units, 
are  closely  related  and  central  to  aged  size 
equivalence  issues.  This  is  because  the  vast 
majority  of  aged  individuals  live  alone  or  with  just 
one  other  person  and  approximately  four-fifths  of 
aged  individuals  living  alone  are  women. 

Gender  effects. — Five  of  the  ten  scales  provide 
distinctions  by  sex  of  head  as  well  as  age.  Four 
indicate  that  the  needs  of  aged  women  living  alone 
are  less  than  those  of  aged  men.  In  these  four 
studies,  estimates  of  women's  needs  vary  from  only 
very  slightly  less  than  men's  (viz.  the  U.S.  poverty 
thresholds  as  constituted  prior  to  1981)  to 
substantially  less  (the  two  ELES  scales  and  the  ISDP 
minimum  income  scale). 

While  the  Wisconsin  BNS  minimum  income  scale 
Indicates  that  women's  needs  are  greater,  the 
standard  error  for  the  sex  of  head  coefficient  is 
very  large   in  comparison  with   the  coefficient 


(Colasanto,  Kapteyn  and  van  der  Gaag  1984,  p.  134). 
Some  limited  experimentation  with  universe  an«f 
variable  specifications  with  the  ISDP  data  have 
produced  coefficients  suggesting  greater  needs  for 
female  heads  also,  and  in  some  cases  the  effect  was 
even  statistically  significant.  However,  given  the 
universe  definition  employed  in  estimating  the 
present  version  of  the  scale,  sex  of  head  was  not 
statistically  significant  and,  as  noted  earlier,  was 
dropped  from  the  model . 

Thus,  notwithstanding  the  large  gender  effects 
shown  in  some  of  the  scales,  it  seems  we  are  some 
way  from  being  able  to  demonstrate  a  consistent  sex 
effect,  much  less  reliably  gauge  its  magnitude.  And 
even  if  its  presence  could  be  firmly  established, 
its  precise  significance  would  be  far  from  clear 
because  of  the  very  low  incomes  of  many  aged  women . 

One  versus  two-person  units. —  Given  the 
problematic  nature  of  gender  effects,  the  question 
of  one-person/ two-person  unit  equivalence  among  the 
aged  is  necessarily  somewhat  clouded.  The  one  to 
two-person  ratios  for  the  ten  scales  are  distributed 
over  a  considerable  range  (see  row  5  of  the  table). 
Six  of  the  scales  indicate  that  the  needs  of  an  aged 
person  living  alone  are  between  70  to  79  percent  of 
an  aged  couple's  needs.  Indeed,  five  oj  the  ten 
scales  yield  one-person/two-person  ratios  in  the 
range  of  76-79.  The  value  for  the  Federal  poverty 
line  (79)  places  it  in  this  group  of  five.  However, 
the  ELES  I  and  BLS  scales  suggest  a  nearly  per 
capita  relationship  (48  and  55).  The  ELES  II  and 
ISDP  minimum  income  scales  are  also  noticeably  below 
the  group  of  six  (61  and  63  respectively). 

The  sex  effect  is  evident  especially  in  the  two 
ELES  scales  and  the  ISDP  minimum  income  scale  and 
would  appear  to  be  largely  responsible  for  the  fact 
that  the  overall  one-person/two-person  ratios  for 
these  scales  are  so  low.  Again,  a  good  deal  more 
work  is  needed  to  understand  the  basis  of  such 
widely  varying  estimates. 

Some  elaborations  of  the  initial  model. 

Further  exploration  of  the  age  effect  and  the 
specification  of  the  family  size  variables  is 
obviously  called  for.  While  in  no  sense  definitive, 
a  minimal  effort  was  possible  after  return  from  the 
meetings.  Some  interesting  results  were  obtained 
and  will  be  reported  on. 

Four  basic  modifications  were  made  to  the  naive 
model  that  we  estimated  Initially:  1)  an 
Interaction  term  was  added  for  family  size  and 
income,  2)  a  dummy  variable  was  introduced  for  one- 
person  aged  units,  3)  a  substantially  expanded  set 
of  control  variables  thought  to  be  related  to 
aged/non-aged  differences  was  added,  and  4)  a  crude 
approximation  of  after-tax  income  was  substituted 
for  the  bef ore-tax  variable. 

The  family  size/income  interaction  term  was  added 
principally  to  test  if  family  size  effects  vary  by 
income  level.  This  possibility  has  been  commonly 
noted  and  has  received  some  empirical  support  (e.g. 
Watts  1967,  Seneca  and  Taussig  1971,  and  Deaton 
1982).  Both  theory  and  empirical  evidence  suggest 
that  the  addition  of  family  members  in  better  off 
families  requires  a  smaller  proportional  increase  in 
expenditures  (e.g.,  Deaton  1982,  p.  43).  Thus,  we 
might  expect  a  size  equivalence  scale  for  low  income 
families  to  be  "steeper"  than  a  scale  for  families 
with  incomes  in  the  middle  or  upper  part  of  the 
distribution. 

The  other  modifications  were  made  principally  to 
find  out  something  more  about  the  nature  of   the  age 


effect.  The  dummy  variable  for  aged  single-person 
units  was  added  because  aged  singles  confront  very 
different  conditions  from  aged  couples.  They  tend 
to  be  older  and  have  notably  lower  incomes.  If 
their  reduced  circumstances  were  to  be  systematic- 
ally related  to  lowered  aspiration  levels, 
specifications  assuming  a  constant  age  effect  by 
unit  size  could  prove  distortive  and  might,  in  part, 
account  for  the  excessively  large  age  effect 
estimated  from  the  original  simple  model.  The 
remaining  modifications  were  undertaken  in  order  to 
see  to  what  extent  the  age  effect  in  the  initial 
model  could  be  explained  by  taking  into  account  some 
of  the  factors  which  might  contribute  to  lower 
bef ore-tax  money  income  needs  for  the  aged.  Given 
the  variables  at  hand  on  the  analysis  file,  home 
tenure,  number  of  earners,  value  of  durables,  and 
amount  of  other  fungible  assets  (excluding  own  home) 
could  be  added  to  the  initial  model .  As  a  matter  of 
general  interest,  a  dummy  for  work  disability  among 
the  non-aged  was  also  incorporated.  Finally,  a  few 
runs  were  made  to  experiment  with  the  Impact  of 
using  after-tax  as  opposed  to  before-tax  income  as  a 
predictor  variable.  The  results  are  given  in  table 
4.'° 

Results  for  the  non-aged. — The  interaction  term 
was  included  in  two  extensions  of  the  naive  model  . 
as  a  sole  addition  to  the  initial  variable  set 
(Model  II),  and  together  with  the  dummy  for  aged 
units  of  size  1  (Model  IV).  The  result  is  given  in 
table  4  (Models  II  and  IV).  The  coefficient  for  the 
interaction  term  has  the  expected  sign  in  each 
instance.  It  is  positive.  Indicating  that  the  lower 
the  income  level,  the  higher  the  elasticity  of 
income  with  respect  to  family  size  and  the  steeper 
the  resulting  equivalence  scale.  However,  the 
magnitude  of  the  coefficient  is  affected  by  the 
other  variables  which  were  added  to  explore  the  age 
effect.  For  example,  when  the  dummy  variable  for 
aged  units  of  size  one  and  the  family  size 
interaction  term  are  both  present  (Model  IV),  the 
interaction  effect  is  halved  and  the  t-value  drops 
from  the  margin  of  significance  at  the  .05  level 
(single-tailed  test)  to  well  below  conventional 
levels  of  significance  (Model  IV  compared  to  Model 
II).  This  would  seem  to  indicate  that  a  good  deal 
of  the  gross  interaction  affect  is  attributable  to 
the  special  circumstances  of  one-person  aged  units. 

The  impact  of  the  interaction  term  on  equivalence 
scales  per  se  for  the  non-aged  is  shown  in  table  5. 
Coefficients  from  Model  II  (the  interaction  term 
only  added  to  the  initial  set  of  variables)  and 
Model  IV  (the  interaction  term  and  the  dummy  for 
aged  units  of  size  one  both  added  to  the  initial 
variable  set)  were  used  to  construct  scales  at  four 
different  reference  income  levels  keyed  to  four- 
person  families  (the  average  income  of  poor 
families,  the  weighted  poverty  threshold  income, 
median  income  and  twice  the  median,  all  in  terms  of 
four-person  family  incomes).11  The  scales  derived 
from  the  initial  model  and  Model  III  (the  initial 
model  plus  the  term  for  aged  units  of  size  one)  are 
included  in  the  table  for  purposes  of  comparison. 

Clearly  the  elasticity  of  income  with  respect  to 
family  size  varies  noticeably  by  income  level.  The 
elasticity  estimated  from  the  initial  model  was  .36. 
Using  Model  II,  the  average  elasticity  ranges 
between  .49  for  families  living  in  poverty  to  .19 
for  those  living  at  satisfaction  levels  experienced 
by  four-person  families  with  incomes  at  twice  the 
median  for  four-person  families. 

Estimates  based  on  Model  IV  also  reveal  important 
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Table  4 . —Coefficients  for  the  initial  aodel  and  selected  alternative 
specifications 

Selected  Model 

coefficients*       

Initial      IIb     1 1 Ic      IVd      Ve      VIf 

Intercept 1401     .1825     .1278     .1499     .1954     .1734 

(6.24)    (5.27)    (5.39)    (4.06)    (7.70)    (6.71) 

lnFS(0J -.0224   -.0791    -.0189   -.0474   -.0230   -.0283 

(-5.01)   (-2.03)   (-4.08)   (-1.26)   (-4.77)   (-6.04) 

lnYOj) 0628     .0565     .0640     .0608     .0481     .0578 

(19.59)   (11.71)   (19.80)   (11.39)   (12.07)   (15.08) 

lnFS*lnY(«) ...     .0082      ...     .0041 

(1.61)  (0.76) 

Perceived  financial 
change,  (1979  vs.  1978): 

BetterO,) 0715     .0715    .0711    .0711    .0752    .0755 

(11.08)   (11.08)   (11.00)   (11.01)   (11.85)   (11.63) 

Worse(04) -.1139   -.1138   -.1137   -.1136   -.1116   -.1141 

(-21.01)  (-21.00)  (-20.96)  (-20.96)  (-20.92)  (-20.88)  ( 

Expected  financial 
change,  (1980  vs.  1979): 

Better(35) -.0286    -.0291    -.0283    -.0286    -.0231    -.0253 

(-4.23)   (-4.30)   (-4.18)   (-4.21)   (-3.48)   (-3.70) 

Worse(j36) -.0508    -.0507    -.0508    -.0508    -.0522    -.0499 

(-9.53)   (-9.52)   (-9.53)   (-9.52)   (-9.98)   (-9.29) 

Aged(0,) 0763     .0751     .0642     .0646     .0416     .0722 

(11.83)   (11.59)    (8.35)    (8.38)    (2.84)   (11.12) 

Aged,  1-person 

unitO,) ...       ...     .0320     .0293     .0278 

(2.90)    (2.53)    (2.39) 

R* .263      .263      .264      .264      .297      .253 

Number  of  Cases  =  5,067 

Note:  t-values  in  parenthesis. 

'Coefficients  for  the  additional  control  variables  incorporated  in  Model  V 

not  included  in  the  table.  See  text  and  text  footnote  10. 
b Initial  aodel  plus  interaction  tern  for  income  and  family  size. 
c Initial  aodel  plus  a  duaay  variable  for  aged  single-person  units. 
dInltial  aodel  plus  interaction  term  for  income  and  family  size  and  a  dummy 

variable  for  aged  single-person  units, 
•initial  aodel  plus  a  duaay  variable  for  aged  single-person  units,  plus 

the  additional  set  of  control  variables. 
f Initial  aodel  substituting  after-tax  for  bef ore-tax  income. 
'Initial  aodel  substituting  after-tax  for  before-tax  income  plus  a 

duaay  variable  for  aged  single-person  units. 


.1583 

(5.69) 

-.0252 

(-5.23) 

.0596 

(15.34) 
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differences  in  elasticity  by  income  level.  While 
the  interaction  coefficient  is  not  significant 
statistically,  it  does  produce  Harked  differences  in 
the  scale.  Comparing  the  extremes,  for  example,  the 
family  size  elasticity  for  poor  families  is  about 
1.6  times  higher  than  for  families  at  the  highest 
level . 

In  sum,  the  data  set  does  offer  some  support  for 
the  notion  of  interaction  between  income  and  family 
size  and  illustrates  that  the  effect  could  be  of 
considerable  practical  significance.  In  fact,  the 
contrast  noted  earlier  in  steepness  between  food- 
based  scales  and  those  estimated  on  the  basis  of 
broader  consumption  sets  might  be  attributable,  in 
part,  to  the  failure  to  appropriately  account  for 
such  interaction  effects.  However,  a  great  deal  of 
the  interaction  seems  to  arise  from  the  special 
circumstances  of  aged  one-person  units.  After 
removing  this  effect,  the  interaction  term  still 
yields  scales  with  noticeably  different  elasticities 
by  income  level,  but  the  interaction  term  Is  not 
statistically  significant.  Thus,  in  future  work  it 
would  seem  advisable  to  explicitly  test  for 
interaction  effects  of  this  type.  Other  more 
sophisticated  ways  of  pursuing  possible  interaction 
effects  are  available  as  well,  and  should  be 
considered  (see  for  example,  Deaton  1982,  p.  41). 

Impact  on  estimates  for  the  aged. —  The  most  con- 
sistent and  important  impact  on  age  effects  and, 
consequently,  aged  equivalent  income,  is  attribut- 
able to  the  addition  of  the  dummy  variable  for  aged 
units  of  size  one  (see  tables  4  and  6).  The  sign  is 
in  the  expected  direction,  and  the  coefficient  is 
statistically  significant  in  each  of  the  four  models 
which  include  the  variable  (III,  IV,  V  and  VII). 
When  the  dummy  is  present,  it  leads  to  three 
results.  The  estimate  for  the  aged  equivalent 
income  is  lower  for  one-person  than  two-person 
units.  One-person  unit  equivalent  income  is  also 
lower  than  estimated  on  the  basis  of  the  initial 
model.  Finally,  the  ratio  for  two-person  units 
increases  somewhat  over  the  initial  model,  but  not 
dramatically  (from  about  30  to  about  36  to  37  when 
scaled  to  non-aged  incomes  =  100).  These  results 
suggest  a  difference  in  the  income  -  needs  relation- 
ship (needs,  of  course,  as  measured  by  income  satis- 
faction) between  one  and  two-person  aged  units,  but 
not  one  large  enough  to  explain  away  the  very  large 
age  effect  uncovered  in  the  initial  model. 

The  attempt  to  net  out  the  effects  of  work  activ- 
ity and  assets  (including  owner-occupied  housing) 
had  some  slight  additional  Impact.  All  additional 
variables  were  of  the  expected  sign,  i.e.,  the  more 
assets  the  less  income  required  to  reach  a  given 
level  of  income  satisfaction;  earnings  activity  and 
nonwork  due  to  disability  among  the  non-aged  were 
associated  with  less  satisfaction  at  a  given  level 
of  bef ore-tax  income.18  However,  the  combined 
effect  of  the  additions  was  not  marked  and  was  most 
evident  for  two-person  units  where  aged  equivalent 
income  increased  from  37  (Model  III)  to  42  (Model 
V). 

Allowing  for  family  size/income  interaction  had 
slight,  if  any,  effect  when  taken  alone  (Model  II), 
and  had  no  visible  impact  when  present  in  con- 
junction with  the  dummy  for  aged  units  of  size  one 
(Model  IV  vs.  Model  III).  Use  of  the  approximation 
of  after-tax  income  also  had  very  little  net  effect 
(Initial  model  vs.  Model  VI  and  Model  III  vs.  Model 
VII)  but  the  results  were  somewhat  mixed.  The 
coefficient  for  the  age  65  plus  dummy  (j9T)  did 
decline  slightly   (by  about  5-6  percent),   but  the 


coefficient  for  Income  (j3?)  dropped  somewhat  more 
(about  10-11  percent).  As  a  result,  and  contrary  to 
expectations,  aged  equivalent  income  on  an  after- 
tax basis  is  essentially  the  same  as  on  a  before-tax 
basis.  Based  on  the  supposition  that  after-tax 
Income  would  be  more  closely  tied  to  satisfaction 
than  before-tax  income,  one  would  have  supposed  that 
the  Income  coefficient  would  be  larger  on  an  after- 
tax as  opposed  to  before-tax  basis.  This  contrary 
finding  is  somewhat  puzzling  and  may  arise  from  the 
somewhat  crude  simulation  of  after-tax  income 
employed. 

The  main  point  raised  earlier  in  the  discussion  of 
table  3  is  little  affected.  The  aged/non-aged 
equivalence  ratios  stemming  from  the  sixth  wave 
ISDP  income  satisfaction  measure  remain  the  lowest 
of  any  considered  and  do  not  appear  to  be  credible. 
Almost  certainly  an  explanation,  if  one  is  to  be 
found,  lies  along  different  avenues  than  those 
touched  on  here. 

At  the  same  time,  and  not  surprisingly,  the  dummy 
variable  for  aged  units  of  size  one  did  have  a 
marked  effect  on  the  one-person/two-person  equiva- 
lence ratios  for  the  aged.  Whereas  the  initial 
model  yielded  a  one  vs.  two-person  equivalence  ratio 
of  78,  the  models  Including  the  extra  dummy  for  aged 
one-person  units  (again  Models  III,  IV,  V  and  VII) 
yielded  ratios  of  no  more  than  50.  Of  the  ten 
scales  considered  earlier,  only  the  one  denoted  ELES 
I  yielded  such  a  low  value. 

Summary  and  conclusions. 

This  paper  has  provided  a  general  discussion  of 
the  use  of  direct  subjective  assessments  of  income 
in  equivalence  scale  applications  and  has  present- 
ed a  heuristic  empirical  example  of  the  estimation 
of  an  equivalence  scale  using  such  measures. 

The  scale  estimated  for  this  paper  was  compared 
with  scales  obtained  by  others  using  direct 
subjective  measures  and  more  traditional  approaches. 
The  comparisons  focused  on  four  aspects  of 
equivalence:  1)  the  impact  of  family  size  on  needs 
among  non-aged  families,  2)  the  relative  needs  of 
aged  and  non-aged  families,  3)  gender  effects  among 
the  aged,  and  4)  the  needs  of  one  and  two-person 
aged  units. 

The  family  size  issue  was  reviewed  in  terms  of  the 
steepness  of  the  scales.  As  a  group,  the  scales 
based  on  direct  subjective  measures  appeared  to  be 
noticeably  less  steep  than  scales  based  on  dietary 
needs  or  food  consumption  and  rather  similar  to 
scales  based  on  more  broadly  defined  consumption 
sets  such  as  necessities  and  especially  total 
current  consumption,  i.e.,  the  two  ELES  scales. 
However,  some  limited  experimentation  with  alterna- 
tive specifications  of  the  model  estimated  in  this 
paper  suggested  that  this  contrast  may  be  due  in 
part  to  a  failure  to  take  proper  or  full  account  of 
possible  interaction  between  family  size  and  income. 

Review  of  the  findings  concerning  equivalence 
issues  for  the  aged  was  inconclusive.  While  there 
was  a  clear  tendency  for  the  needs  of  the  aged,  as 
defined  in  these  studies,  to  be  less  than  the  needs 
of  the  non-aged,  estimates  of  the  magnitude  of  the 
needs  differential  varied  widely.  No  consistent 
pattern  was  discernible  in  regard  to  the  other  two 
issues  central  to  question  of  aged  equivalence 
(gender  effects  and  the  needs  of  one  vs.  two-person 
aged  units) . 

It  would  seem  that  before  a  reasonably  orderly  set 
of  empirical  results  can  be  expected  to  emerge  for 
the  aged  much  more  work  will  have   to  be  done  to 


40 


-Non-aged  unit  size  equivalence  scales  based  on  selected  alternative 
specifications  of  the  Initial  model 


Unit  Size 

Mode 1  *  — 

12      3       4       5       6       7       8    IFSEb 

Initial  model 61     78     90     100     108     116     122     128     .36 

Model  II 
Income  level 

AP($392/m) 48  71  88  100  110  119  126  133  .49c 

PT($616/m) 52  74  89  100  109  116  123  128  .43c 

M(S1.882/m) 65  82  93  100  106  110  114  118  .28° 

2M($3,763/«) 77  88  95  100  104  107  109  112  .  19c 

Model  III 66    81     92     100     107     113     118     123     .30 

Model  IV 
Income  level 

AP($392/m) 59  78  90  100  108  115  121  126  .  Se"6 

PT($616/m) 62  79  91  100  107  113  119  124  .33° 

M($l,882/m) 68  83  93  100  106  110  115  118  .26° 

2M(S3.763/m) 73  86  94  100  105  109  111  115  .22c 

INCOME  LEVEL  KEY: 

AP  -  1/12  the  average  annual  income  of  poor  4-person  families,  1979. 
PT  -  1/12  the  annual  poverty  threshold  for  4-person  families,  1979. 

M  -  One-twelfth  the  median  income  of  4-person  families,  1979. 
2M  -  One-twelfth  twice  the  median  income  of  4-person  families,  1979. 

"For  description  of  models  see  notes  to  table  4  and  text. 

Income-family  size  elasticity. 
'Average  elasticity.   Elasticity  for  the  lower  part  of  the  scale  is  higher  than 

for  the  upper  part  of  the  scale. 


One-person. 
Two-person . 


*See  notes  to  table  4  and  text  for  description  of  models. 
bAt  a  welfare  level  equivalent  to  that  of  four-person  families  with 
incomes  at  the  median  for  four  person  families. 
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refine  existing  independent  variables  and  introduce 
new  ones  and  to  develop  More  appropriate  models. 
Likewise,  considerable  additional  work  remains  to  be 
done  in  order  to  develop  the  equivalence  scale 
presented  here  beyond  the  frankly  preliminary  stage. 

Footnotes. 

*Dan  Radner,  Ben  Bridges,  Clarise  Lancaster,  Arie 
Kapteyn  and  Steve  Dubnoff  provided  a  number  of 
very  helpful  suggestions  and  criticisms.  My 
thanks  to  Sharon  Johnson  for  working  through  the 
complicated  fifth  and  sixth  wave  ISDP  files  to 
produce  the  matched  analysis  extract.  Ray  Becker 
and  Mike  Rozdilski  also  assisted  with  the  data 
processing,  and  Weltha  Logan  helped  with  editorial 
details. 

The  algorithm  used  to  simulate  after-tax  income  for 
the  1984  Proceedings  version  of  this  paper 
contained  keying  errors  which  had  a  substantial 
affect  on  some  of  the  parameter  values  (namely  the 
intercept  and  the  coefficients  for  family  size  and 
income)  for  the  models  which  incorporate  after-tax 
income.  Fortunately,  the  errors  tended  to  offset 
each  other  and  so  their  net  Impact  on  aged/nonaged 
equivalence  was  slight.  These  errors  have  been 
corrected  in  the  current  version  of  the  paper. 

'The  author  encourages  Interested  parties  to 
contact  him  at  the  following  address:  Social 
Security  Administration,  1875  Connecticut  Ave. , 
N.W.,  Rm.  320-N,  Washington,  D.C.  20009 

*An  alternative  specification  included  female  head 
as  a  control  variable.  The  sign  was  negative,  but 
not  significant  (t  =  -0.81).  The  model  was 
reestimated  with  the  five  control  variables 
listed. 

'Based  on  comparison  of  the  sums  of  squares 
uniquely  associated  with  the  model  variables  (as 
gauged  by  successively  adding  each  variable  as 
the  last  in  the  model),  the  sums  of  squares  due  to 
current  income  and  family  size  amount  to  only 
about  60  percent  of  the  sum  of  squares  associated 
with  the  four  subjective  change  variables.  The 
powerful  effect  of  perceived  financial  change  on 
current  satisfaction  with  income,  especially 
change  for  the  worse,  has  been  documented  earlier 
by  Vaughan  and  Lancaster  (1980)  and  Dubnoff, 
Vaughan  and  Lancaster  (1981). 

♦Sources  for  the  scales  in  table  1  are  as  follows: 
[1]  Rainwater  (1974),  tables  5-2  and  5-4,  pp. 
102,105;  [2]  &  [3]  Dubnoff  and  Strate  (1984), 
table  2,  p.  18;  [4]  &  [7]  Colasanto,  Kapteyn  and 
van  der  Gaag  1984,  pp.  127-138;  [5]  Danziger,  van 
der  Gaag,  Taussig  and  Smolensky  (1984),  table  2, 
p.  53;  [6]  Dubnoff  (1982),  table  1,  p.  10;  [8] 
Dubnoff,  Vaughan  and  Lancaster  (1981),  table  2,  p. 
351;  [9]  This  paper. 

"The  exponent  on  the  right  hand  side  of  equation 
(6)  is  in  fact  the  negative  elasticity.  The 
elasticities  appearing  at  the  foot  of  tables  1  and 
2  function  in  the  same  fashion,  but  because  they 
are  positive,  they  operate  on  the  ratio  FSj/FS4 
instead  of  FS4/FSj  as  in  (6). 

'While  the  labeling  of  the  BNS  minimum  income  scale 
as  "steeper"  than  the  others  is  somewhat 
arbitrary,  its  elasticity  is  more  than  two 
standard  deviations  above  the  mean  elasticity  of 
the  nine  scales  presented  in  the  table. 

'The  fact  that  this  estimate  is  preliminary  is  to 
be  stressed.  Limited  experimentation  with  alter- 
native models  and  variable  definitions,  some  of 
which  is  reported  on  in  the  last  section  of  the 
paper,  indicates  that  a  range  of   elasticities  may 


be  obtained.  At  this  early  stage  of  our  research 
we  have  not  developed  a  clear  basis  for  selecting 
one  of  these  alternatives  over  another. 

•In  light  of  the  findings  reported  in  the  last 
section  of  the  paper,  perhaps  this  statement  is 
too  unequivocal.  Unlike  most  of  the  other  scales, 
the  BNS  minimum  income  scale  pertains  to  a  poverty 
level  income,  and  my  own  analysis  yields  an 
average  elasticity  of  .43  for  a  poverty  threshold 
welfare  level  when  the  dummy  for  aged  units  of 
size  one  is  left  out  of  the  model  (the  model  for 
the  BNS  scale  contained  no  such  dummy).  Clearly, 
the  Dubnoff-Strate  results  indicate  that  subjec- 
tive scales  estimated  for  poverty  income  welfare 
levels  are  not  necessarily  "steeper".  One  could 
also  protest  that  the  ISDP  minimum  income  scale  is 
not  "steeper"  than  the  others,  but  it  is  also  true 
that  the  income  levels  associated  with  the  scale 
are  very  close  to  the  median  rather  than  at  the 
poverty  level.  In  any  case,  the  difference 
between  the  BNS  minimum  income  scale  and  the  BNS 
WFI  scale,  essentially  estimated  at  median  income, 
has  also  been  encountered  in  Dutch  studies 
(Goedhart,  et  al . ,  1977).  although  the  difference 
is  not  as  pronounced.  Until  recently,  members  of 
the  original  Leyden  group  had  not  made  much  of  the 
difference,  but  recently  some  attention  has  been 
given  to  this  matter  (personal  communication,  Arie 
Kapteyn) . 

•Strictly  speaking  this  statement  is  only  true  for 
scales  in  which  age  effects  do  not  vary  by  unit 
size.  This  condition  holds  for  all  scales 
estimated  with  a  dummy  specification  for  age,  as 
is  the  case  for  all  but  two  of  the  scales.  While 
the  poverty  line  age  effect  was  not  explicitly 
derived  on  the  basis  of  a  dummy  specification,  the 
aged/non-aged  ratios  for  one  and  two-person  units 
are  the  same  (.90).  However,  for  the  BLS  scale 
the  aged/non-aged  equivalence  ratio  for  1-person 
units  is  .72.  as  opposed  to  the  ratio  of  .93  based 
on  a  comparison  of  two-person  units  as  shown  in 
the  table. 

>°An  after-tax  specification  of  income  is  clearly 
the  variable  of  choice  for  most  applications,  but 
the  available  variable  assumes  use  of  the  standard 
deduction  only  for  the  Federal  income  tax  and  thus 
underestimates  after-tax  income  levels  for  about 
the  upper  third  of  the  distribution.  As  this 
project  advances  we  hope  to  develop  a  better 
measure  of  after-tax  income  and  employ  it  more 
generally  in  our  analyses. 

llThe  income  (Y)  for  a  family  of  size  "j"  equivalent 
to  that  of  a  given  reference  level  income  for  a 
family  of  four  (Y,)  may  be  derived  from  the 
exponentiated  value  of  the  following  expression: 

P,(lnPSt  -  lnFSj     i„v:il  ♦  -4-rinFS.n 


where  6  is  the  coefficient  for  the  family 
size/income  interaction  term.  Details  on  the 
derivation  of  this  formula  are  available  from  the 
author. 

•The  t-values  for  all  but  one  of  the  additional 
variables  (living  in  means-tested  public  or 
publicly  subsidized  housing  with  a  t-value  of 
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1.41)  range  between  1.8  and  7.8.  While  the  magnl- 
tude  of  a  number  of  the  coefficients  seems  rather 
large,  this  is  probably  attributable,  at  least  In 
part,  to  the  fact  that  the  income  definition 
employed  in  Model  V  is  before-tax  and  so  variables 
for  homeownership  and  work  presumably  reflect  tax 
effects  In  addition  to  whatever  independent 
effects  they  may  exert. 
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Courtenay  Slater,  CEC  Associates 

The  number  and  variety  of  SIPP  papers  being 
presented  at  these  meetings  —  In  advance  of 
actually  having  seen  any  data  from  SIPP  — 
certainly  whet  one's  appetite  for  what  is  to 
come  once  data  actually  become  available.   The 
three  papers  on  which  I  will  be  commenting 
raise  two  general  points  to  which  I  want  to 
draw  attention.  One  is  the  special  difficulty 
of  measuring  the  wealth  of  the  wealthy.  The 
other  concerns  the  complexities  of  defining 
income  and  the  importance  of  taking  noncash 
income  Into  account.   A  category  of  noncash 
income  deserving  particular  attention  is  the 
implicit  income  derived  from  owning  and 
occupying  one's  own  home. 


Measuring  the  Wealth  of  the  Wealthy 

The  Lamas  and  McNeil  paper  presents  evidence 
of  excellent  response  rates  being  obtained  in 
the  first  few  rounds  of  SIPP  data  collection. 
It  also  illustrates  the  first  of  the  general 
points  I  want  to  make,  the  one  about  measuring 
the  wealth  of  the  wealthy. 

SIPP  will  provide  better  data  on  the  asset 
of  lower  and  middle  income  groups  than  on  those 
held  by  the  wealthy.  This  is  not  a  criticism 
of  SIPP  —  a  single  survey  cannot  do  absolutely 
everything.   Rather,  It  is  a  caution  to  be 
borne  in  mind  about  the  need  for  careful 
studies  to  evaluate  the  quality  of  the  asset 
data.   And  it  may  even  be  an  argument  for  the 
occasional  use  of  enhanced  samples  of  upper 
income  households. 

There  are  several  reasons  for  anticipating 
diminishing  accuracy  of  the  asset  data  as  one 
moves  up  the  income  scale.  The  assets  of  the 
wealthy  are  more  varied  and  held  In  more 
complicated  forms.  The  chances  of  forgetting 
some  assets  increase  and  so  do  the  incentives 
for  concealment.  The  wealthy  are  habituated  to 
not  telling  the  government  any  more  than  they 
must. 

In  addition,  the  variety  of  forms  in  which 
the  assets  can  be  held  and  the  rapidity  with 
which  new  investment  vehicles  evolve  make  it 
impossible  for  the  SIPP  questionnaire  to 
specifically  cover  them  all.  The  SIPP 
questions  are  detailed  and  comprehensive,  but 
there  will  always  be  some  new  forms  of 
commodity  futures,  stock  options,  or  other 
esoteric  investments  which  won't  get  picked  up, 
or  which  won't  be  accurately  valued. 

The  pattern  of  response  rates  described  in 
the  Lamas-McNeil  paper  reinforces  one's  prior 
expectation  of  poorer  response  rates  among  the 
wealthy.  Overall,  response  rates  promise  to  be 
very  good.   I  say  "promise",  because  the  Lamas- 
McNgll  data  on  response  rates  is  based  in  part 
on  a  pre-test,  together  with  an  assumption 
that,  if  callbacks  had  been  completed  in  the 
pretest  (which  they  were  not) ,  the  information 
would  have  been  supplied  in  a  high  percentage 
of  cases. 


Let  us  suppose,  however,  that  actual 
response  rates  equal  the  Lamas-McNeil 
projections.  Nonresponse  among  those  ages  45 
to  64  will  be  about  three  times  as  great  as  for 
those  under  age  34.   Nonresponse  for  college 
graduates  will  be  twice  that  of  those  with  less 
than  a  high-school  education.   In  other  words, 
the  age  and  education  groups  known  to  have  the 
highest  average  incomes  have  the  highest 
nonresponse  rates. 

These  response  rates  pertain  to  asset 
ownership.   When  the  questions  about  asset 
amounts  were  asked,  nonresponse  was  especially 
high  regarding  stocks  and  bonds,  types  of 
assets  held  predominately  by  upper  income 
households. 

To  sum  up,  the  percentage  of  total  wealth 
missed  by  the  SIPP  survey  is  likely  to  be 
substantially  greater  than  the  percentage  of 
nonrespondents  to  the  survey.  There  will  be  a 
need  for  studies  which  can  quantify  this  gap. 

A  further  problem  with  respect  to  valuing 
and  interpreting  assets  is  that  asset  values 
fluctuate.   ValueS  of  some  assets,  such  as 
common  stocks,  can  fluctuate  suddenly  and  by 
large  amounts.   The  Dow  Jones  Industrial 
Average  Increased  52  percent  from  June  1982  to 
June  1983.   This,  of  course,  was  a  genuine 
increase  in  wealth  for  stock  holders  and  one  of 
which  they  generally  were  aware.   But  the  8 
percent  rise  in  the  Dow  between  July  30  and 
August  2,  1984  may  or  may  not  turn  out  to  have 
any  meaningful  degree  of  permanence. 
Individual  stocks  can  fluctuate  by  far  more 
than  the  market  averages.   How  accurately  will 
survey  respondents  be  able  (and  willing)  to 
value  assets  such  as  common  stocks?   And  how 
meaningful  would  an  accurate  valuation  as  of  a 
specific  date  be  anyway? 

Changes  in  asset  values  represent  changes  in 
wealth,  but  for  assets  whose  value  fluctuates 
widely  and  freqently  around  the  underlying 
trend,  some  smoothing  of  short  run  fluctuations 
will  be  desirable  for  many  analytic  purposes. 
This  question  will  be  of  importance  both  for 
cross-sectional  comparisons  of  families  who 
hold  their  assets  in  different  forms  and  for 
analysis  of  changes  in  wealth  over  time.   The 
message  for  the  Census  Bureau  is  that  the 
SIPP  data  should  be  made  available  in  a  way 
which  allows  the  user  the  flexibility  of 
using  alternative  methods  of  asset 
valuation. 


Owner-Occupied  Housing 

For  the  majority  of  middle-income  families, 
the  family  home  represents  their  most 
substantial  investment.   In  comparing  the 
income  and  wealth  of  families  which  do  and  do 
not  own  their  own  homes,  failure  to  include  the 
implicit  income  received  from  living  in  an 
owned  home  produces  misleading  comparisons. 

This  can  be  illustrated  by  comparing  the 
incomes  of  two  hypothetical  retired  couples. 
Both  have  identical  incomes  from  pensions  and 
social  security.  The  A's  live  in  their  own 
fully-paid-for  home,  which  has  a  market  value 
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of  $100,000.   The  B's  have  sold  their  similar 
home  for  $100,000  and  invested  the  proceeds  in 
tax-exempt  bonds  which  provide  them  with  income 
of  $10,000  a  year. 

On  a  simple  comparison  of  cash  income  the 
B's  are  $10,000  better  off.  However,  they  pay 
$10,000  per  year  in  rent  on  their  Florida 
apartment,  an  expense  the  A's  do  not  have. 

In  fact,  the  A's  and  the  B's  have  equivalent 
incomes  and  equivalent  wealth,  which  they  have 
chosen  to  utilize  in  different  ways,  and  a 
comparison  based  only  on  money  income  is  quite 
misleading. 

My  example,  of  course,  is  greatly  over- 
simplified.  I  have  left  out  the  taxes, 
maintenance  expenses,  and  so  forth  that  the  A's 
incur  living  in  their  owned  home. 
Introduction  of  these  and  other  complexities 
would  be  required  in  any  real  life  statistical 
presentation,  but  this  would  not  alter  the  need 
to  include  the  value  (income)  derived  from  home 
ownership  in  any  valid  comparison  of  the 
relative  well-being  of  renters  and  home  owners. 

The  Bureau  of  Economic  Analysis  and  the 
Bureau  of  Labor  Statistics  have  wrestled  with 
this  problem  for  years,  and  have  found  ways  of 
handling  it.   BEA  includes  the  imputed  value  of 
the  shelter  provided  by  owner-occupied  homes  in 
the  personal  consumption  expenditure  component 
of  GNP  and  the  equivalent  imputed  income  in 
national  income.   BLS  now  uses  rental 
equivalence  to  estimate  home  ownership  costs 
for  the  Consumer  Price  Index.   With  the  new 
emphasis  on  noncash  income,  the  Census  Bureau 
also  needs  to  develop  ways  of  including  income 
from  home  ownership  in  its  income  tables. 

This  issue  was  brought  to  mind  by  both  the 
Radner  and  the  Vaughan  papers.   Radner  isolates 
that  group  of  elderly  who  have  both  low  incomes 
and  limited  assets  and  compares  them  to  the 
universe  of  all  elderly  households.   This  is  a 
very  useful  cross-classification,  and  one  would 
like  to  see  how  it  would  look  with  an  expanded 
income  definition  which  includes  noncash 
income.   Looking  only  at  their  cash  incomes  the 
financial  picture  for  the  low  income,  low  net 
worth  group  is  grim,  indeed.   The  gap  between 
this  group  and  other  elderly  households  would 
be  narrowed  by  the  inclusion  in  income  of  food 
stamps,  medicaid,  and  other  means-tested 
noncash  assistance,  but  it  would  be  widened  by 
including  imputed  income  from  home  ownership. 
If  the  income  concept  is  widened,  as  it  should 
be,  it  should  be  done  comprehensively  to 
include  the  noncash  income  received  from  all 
major  sources,  not  .lust  from  government 
assistance. 

The  role  of  noncash  income  also  needs  to  be 
considered  in  interpreting  the  Vaughan  paper. 
Vaughan  asserts  that  "the  aged  require  less 
income  than  the  nonaged  to  sustain  a  comparable 
level  of  living.   For  example,  they  are  able  to 
maintain  equivalent  nutritional  status  at  lower 
levels  of  food  consumption,  they  often  live  in 
fully  paid  for  owner  occupied  housing,  no 
longer  experience  the  expenses  associated  with 
working  and  generally  a  smaller  portion  of 
their  incomes  is  devoted  to  taxes." 

If  this  statement  is  correct  at  all,  it  is 
correct  only  because  the  income  definition  on 
which  it  is  based  is  pre-tax  money  income. 


This  is  the  concept  for  which  data  is  most 
readily  available,  but  it  is  not  the  most 
appropriate  concept  for  making  comparisons  of 
income  needs  (or,  as  Vaughan  terms  them, 
"family  equivalence  scales"). 

The  better  concept  would  be  after-tax, 
after-transfer  money  and  non-money  income, 
incuding  imputed  income  from  owner-occupied 
housing.   I  suspect  such  a  comparison  would 
show  that  it  costs  the  elerly  more,  not  less, 
to  achieve  any  given  standard  of  living.   Many 
of  these  costs  are  met  by  Medicare  and 
Medicaid,  however,  and  others  by  the  in-kind 
return  realized  by  continuing  to  utilize  the 
house,  furnishings,  and  other  consumer  durables 
purchased  earlier  in  the  life  cycle.   It  is 
only  because  a  smaller  proportion  of  their 
living  costs  are  met  from  current  money  income 
that  the  elderly  may  be  able  to  achieve  a  given 
standard  of  living  with  less  "income,"  narrowly 
defined. 

These  three  papers  on  SIPP  are  tantalizing 
in  their  promise  of  the  value  of  the  data 
becoming  available.   They  also  serve  to  remind 
us  of  the  conceptual  and  analytic  work  still  to 
be  done  if  SIPP  data  is  to  be  given  the  most 
useful  and  meaningful  presentations  possible. 
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SURVEY  OF  INCOME 

AND 

PROGRAM  PARTICIPATION: 

SESSION  II 


This  section  is  comprised  of  five  papers  presented 
in  this  session  which  was  sponsored  by  the  Section 
on  Social  Statistics. 


TOWARD  A  LONGITUDINAL  DEFINITION  OF  HOUSEHOLDS 
David  Bryon  McMillen  and  Roger  A.  Herriot,  Bureau  of  the  Census 


Introduction  and  Background 

Data  collection  and  analysis  in  the  social 
sciences  generally  focus  on  cross-sectional 
surveys  such  as  the  Current  Population  Survey 
(CPS).  Conseauently,  most  of  our  concepts  and 
data  analysis  tools  are  structured  around  point 
estimates  of  some  phenomenon  or  characteristic. 
To  the  extent  that  we  try  to  develop  longi- 
tudinal concepts  and  measures  of  social 
phenomena,  that  is  to  say  viewing  events  across 
time  rather  than  at  one  point  1n  time,  we  con- 
flict with  these  cross-sectional  structures.  It 
1s  the  goal  of  this  paper  to  confront  that 
cross-sectional /longitudinal  conflict  and 
attempt  some  reconciliation.  More  specifically, 
this  paper  attempts  to  develop  longitudinal 
definitions  of  households  and  families  which  are 
useful  for  observing  these  units  across  time  and 
for  constructing  aggregate  characteristics 
across  that  time  period,  while  not  creating 
serious  conflict  with  our  cross-sectional  con- 
structs of  household  type  and  composition.  We 
begin  this  exercise  by  examining  cross-sectional 
household  concepts  from  the  CPS  and  recounting 
the  deficiencies  of  that  perspective.  Next  we 
will  examine  several  types  of  lonqitudinal 
definitions,  identify  the  type  that  is  usually 
cited  as  most  useful,  and  describe  three  defin- 
itions within  that  framework.   In  the  third 
section  of  this  paper  we  will  evaluate  the  defin- 
ition options  available  in  terms  of  utility  as 
well  as  what  is  possible  given  the  data  at  hand 
to  implement  such  a  definition.  Next  we  will 
illustrate  how  this  definition  might  be  used 
in  calculating  aggregate  household  character- 
istics and  in  tabulations  of  the  number  of  house- 
holds, household  types,  and  household  character- 
istics. 

Point  Estimates  and  Longitudinal  Measures 

The  household  definitions  used  in  the  CPS 
serve  as  adequate  measures  for  the  Intended 
crosssectional  purposes.  Indeed,  few  people 
argue  that  these  definitions  create  a  problem 
when  estimating  the  number  of  households  by  type 
at  the  time  of  the  survey.  However,  when  those 
definitions  are  used  in  conjunction  with  other 
variables,  problems  begin  to  develop. 

Discussions  on  measuring  annual  household 
income  from  the  CPS  center  around  the  retro- 
spective nature  of  the  measure.  Household 
members  as  of  March  are  asked  to  recall  their 
income  for  the  previous  calendar  year,  and  the 
income  of  all  members  are  aggregated  to  create  a 
household  income.  The  problem  centers  on  the 
varying  lengths  of  household  membership  and  the 
unvarying  than  to  provide  familiarity.  Static 
definitions,  a  person  1s  a  member  of  the  house- 
hold for  part  of  the  year  and,  thus,  contributes 
income  for  only  part  of  the  year,  that  person's 
entire  annual  Income  1s  Included  1n  the  house- 
hold Income.  Similarly,  persons  who  are  not 
members  of  the  household  at  the  time  of  the  sur- 
vey are  not  included  in  calculating  the  annual 
household  income,  even  though  they  may  have  con- 
tributed income  for  most  of  the  year.  This  type 


of  criticism  1s  often  used  to  question  the 
adequacy  of  the  CPS  Income  measures;  however,  1t 
1s  better  viewed  as  an  example  of  the  problems 
created  by  combining  a  point  estimate  of  house- 
hold composition  with  a  longitudinal  (annual) 
measure  of  Income.  Inevitably,  the  compromises 
necessary  to  combine  such  cross-sectional  and 
longitudinal  constructs  produce  a  less  than 
Ideal  measure. 

Similar  criticism  of  the  CPS  household  data 
can  be  made.  If  we  examine  consecutive  March 
measures  of  the  distribution  of  households  by 
type  we  observe  little  change.  The  CPS  measure 
of  households  masks  most  of  the  interesting 
change  in  the  distribution  of  house  holds.  For 
example,  in  recent  years  the  number  of  married 
couple  households  has  changed  at  a  ..rate  of  less 
than  1  percent  a  year,  or  about  200,000  house- 
holds. Concurrent  with  that  indistinguishable 
change  are  over  3  million  marriages  and  divorces, 
not  to  mention  changes  in  household  type  as  a 
result  of  the  death  of  one  member.  The  small 
net  change  creates  the  appearance  of  stability, 
while  masking  considerable  activity.  Again,  the 
problem  is  not  so  much  the  Inadequacy  of  the 
data,  but  rather  the  difficulty  of  measuring 
longitudinal  events  with  point  estimates. 

When  criticism  is  leveled  against  a  particular 
measure,  the  problem  often  is  not  the  measure  but 
rather  the  incongruity  between  the  measuring 
Instrument  and  the  time  frame  being  considered. 
The  examples  used  above  are  annual  measures,  but 
the  same  problem  exists  regardless  of  the  length 
of  time.  Most  social  measurement  is  discrete 
while  time  is  continuous.  The  goal  of  course  is 
to  get  to  the  point  where  the  difference  between 
the  two  is  trivial  and  can  be  easily  ignored. 

In  summary,  much  of  the  criticism  of  CPS 
measures  can  be  attributed  to  this  discrepancy 
between  the  time  reference  of  the  social 
measurement  and  the  cross-sectional  survey 
Instrument.  One  solution  to  the  problem  is  to 
decrease  that  difference  by  repeatedly  measur- 
ing the  phenomenon  in  question  during  the  year. 
Those  observations  can  then  be  aggregated  to 
produce  measures  which  cover  a  number  of  time 
Intervals.  It  1s  from  this  perspective  that 
the  design  for  the  Survey  of  Income  and  Program 
Participation  (SIPP)  has  developed. 

The  design  of  this  survey  1s  to  interview  the 
household  every  4  months  over  a  2  1/2-year 
period,  and  to  collect  in  those  interviews 
monthly  data  on  household  composition,  Income, 
labor  force  participation,  and  a  number  of 
other  characteristics.  Those  monthly  data  can 
then  be  aggregated  to  larger  temporal  units  such 
as  quarterly  or  annual  measures.  However,  with 
the  Idea  of  aggregating  monthly  units  comes  the 
problem  of  defining  which  units  should  be  agoreg- 
ated  across  time  and  which  should  not.  That  is 
to  say,  which  households  are  the  same  over  the 
period,  which  exist  at  the  beginning  of  the 
period  but  not  at  the  end,  and  which  exist  at  the 
end  but  not  at  the  beglnnlnq.  Without  such  a 
definition,  aggregating  above  the  person  level 
1s  Impossible. 
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Defining  households  across  time  is  an  Issue 
that  has  been  debated  for  several  years  without 
resolve;  however,  it  is  necessary  that  the  Census 
Bureau  decide  which  of  the  many  proposed  methods 
will  be  used  for  the  publication  series  from  the 
Survey  of  Income  and  Program  Participation 
(SIPP).  Thi s  paper  moves  one  step  nearer  to 
that  decision  by  summarizing  the  proposals  on 
how  longitudinal  households  should  be  defined 
and  recommending  a  system  to  be  used.  In 
addition,  this  paper  will  begin  to  Identify  what 
conceptual  and  processing  problems  remain  un- 
solved given  the  definition  chosen. 

Several  proposals  have  been  offered  for  defin- 
ing longitudinal  households.  Griffith  (1978) 
outlines  six  measures,  one  of  which  is  the  tradi- 
tional Current  Population  Survey  (CPS)  defini- 
tion. Griffith  also  proposes  that  the  Census 
Bureau  use  several  definitions  1n  tabulating 
households  from  SIPP.  Others  who  had  proposed 
variant  measures  Include  two  from  Davey  (1980), 
Crosby  (1979),  Lane  (1978),  *  Smith-Yeas  (1981). 
Yeas  (1981)  in  summarizing  past  work  identifies 
four  keys  for  labeling  definitional  methods: 
static;  dynamic;  staticdynamic  hybrids;  and 
attribute  methods.  In  the  following  section  we 
will  discuss  several  types  of  longitudinal  defin- 
itions for  households. 

Types  of  Longitudinal  Household  Definitions 
Static  definitions  of  households  fix  the 
household  composition  and  characteristics  at  a 
given  point  in  time  and  calculate  other  attri- 
butes from  that  point.  These  definitions  are 
the  standard  cross-sectional  perspective  on 
households  common  to  the  CPS  and  other  similar 
surveys.  Using  a  point  estimate  of  household 
composition,  other  attributes  are  calculated 
assuming  that  the  composition  chosen  existed  for 
the  full  period.  Thus,  some  estimate  of  annual 
income  for  each  member  is  aggregated  to  produce 
an  annual  household  Income,  regardless  of 
whether  members  were  there  for  the  full  period 
or  joined  the  day  before  the  interview.  This 
type  of  household  definition  is  the  logical  out- 
growth of  cross-sectional  surveys  where  Inter- 
views are  conducted  at  one  point  in  time  and 
aggregates  of  past  events  are  a  function  of 
respondent  recall.  This  type  of  definition 
coincides  with  the  instantaneous  conception  of 
a  household  which  we  use  from  day  to  day. 

Static  definitions  are  both  useful  and  fami- 
liar for  cross-sectional  surveys;  however,  they 
serve  little  purpose  1n  longitudinal  surveys 
other  than  to  provide  familiarity.  Static 
definitions,  for  a  number  of  reasons,  Ignore  the 
dynamic  activity  common  to  households — households 
are  formed  by  marriages  and  dissolved  by  divorces, 
children  leave  home  and  set  up  their  own  house- 
hold, or  move  in  with  relatives,  and  so  on.  It 
1s  difficult  to  justify  the  expense  and  com- 
plexities necessary  to  measure  these  dynamics  1f 
we  then  suggest  to  Ignore  them  1n  defining  house- 
holds. It  is  useful  to  portray  static  defini- 
tions here,  however,  for  they  represent  one  end 
of  the  definition  spectrum. 

Dynamic  definitions  of  households  occupy  the 
other  end  of  the  spectrum.  These  definitions 
recognize  change  as  Inherent  1n  observslng 


households  across  time,  and  attempt  to  in- 
corporate that  change  Into  the  definition.  Thus, 
household  characteristics  and  attributes  become 
valable  to  measured  as  households  change,  are 
created,  and  dissolve  during  the  period  of  obser- 
vation. In  other  words,  these  definitions 
attempt  to  minimize  the  extent  to  which  dynamic 
concepts  are  forced  into  static  categories. 
Needless  to  say,  dynamic  definitions  are  better 
suited  to  a  longitudinal  survey  such  as  SIPP; 
however,  such  definitions  are  difficult  to  devise 
and  to  carry  out. 

One  of  the  first  difficulties  encountered  with 
dynamic  definitions  is  that  they  produce  measures 
which  are  not  readily  familiar  to  many  of  those 
who  use  census  data.  The  most  common  illustra- 
tion of  this  point  uses  household  size.  Static 
definitions  produce  measures  of  household  size 
such  as  2  or  3,  which  are  1nti=uitively  meaning- 
ful. That  is,  they  fit  with  our  instantaneous 
image  of  households  because  they  represent  the 
household  size  at  one  point  in  time— the  survey 
date.  Dynamic  definitions  produce  measures  of 
household  size  which  look  more  like  averages 
across  a  number  of  households— 2.4  or  3.2  members 
1n  the  household.  These  measures  are  summary 
statistics  of  the  household  experience,  summariz- 
ing across  time.  In  other  words,  dynamic  defini- 
tions force  us  out  of  that  instantaneous  view 
of  households  and  into  thinking  about  them  as 
something  which  change  across  time;  our  stat- 
istics produce  a  summary  of  that  change.  Yeas 
(1981)  suggests  several  ways  of  handling  the 
problem  of  household  size-rounding,  using  modal 
size,  etc.— however,  it  may  be  best  to  reeducate 
the  reader  to  think  of  annual  household  character- 
istics as  the  aggregate  of  a  number  of  discrete 
experiences.  There  are  other  more  troublesome 
problems  to  be  dealt  with  in  developing  dynamic 
household  definitions.  I  will  deal  here  only 
with  definitional  problems,  acknowledging  that 
there  are  also  measurement  problems  to  be  con- 
sidered later  in  this  paper. 

Unlike  many  demographic  variables,  there  are 
several  aspects  of  dynamic  households  for  which 
there  1s  no  definition  or  consensus  as  to  what 
constitutes  a  change  in  type.  For  example,  if  a 
husband  and  wife  divorce,  there  are  several  ways 
we  can  account  for  this  on  our  household  ledger. 
We  could  count  this  as  the  dissolution  of  the 
husband/wife  household  and  the  formation  of  two 
new  households.  This  results  in  a  net  increase 
of  one  household  in  existence  at  that  point  in 
time  and  an  increase  of  two  households  when 
countinq  the  number  of  households  existing  during 
the  period.  Alternatively,  we  could  allow  one 
household  to  be  the  continuation  of  the  husband/ 
wife  household.  Again  we  have  a  net  increase  of 
one  household  in  existence,  but  because  of  the 
continuing  household,  we  increase  only  by  one  the 
number  of  households  existing  during  the  period. 
To  generalize,  a  household  may  experience  a 
number  of  changes  across  time  and  we  can  converse 
easily  about  the  discrete  events.  However,  we  do 
not  have  a  clear  concept  of  when  those  changes 
result  In  the  formation  of  a  new  household  and 
the  dissolution  of  an  old  household.  One  extreme 
1s  to  say  that  any  change  to  the  household  com- 
position results  in  a  new  household.  At  the 
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other  extreme  are  those  who  say  that  this  1s  an 
Issue  without  resolution  and  suggest  that  we 
abandon  the  measurement  of  household  character- 
istics except  as  they  pertain  to  Individuals.     In 
other  words,  before  we  can  Implement  a  dynamic 
definition  of  households,  we  must  first  develop  a 
set  of  continuity  rules  or  accounting  principles 
which  identify  cases  of  household  dissolution, 
household  formation,  and  cases  where  two  house- 
holds at  two  points  in  time  are  identified  as  the 
same  household. 

Most  longitudinal   household  definitions  that 
have  been  proposed  fall   somewhere  between  the 
static  and  dynamic  extremes.     Each  acknowledges 
the  difficulty  of  developing  continuity  rules, 
and  proposes  some  static-dynamic  blend  to  flness 
those  problems.     A  numher  of  cross-sectional/ 
dynamic  hybrid  definitions  have  been  proposed. 
One  set  of  these  definitions  1s  quasi-dynamic  and 
acknowledges  that  a  set  of  continuity  rules  has 
yet  to  be  developed.     Another  set  is  basically 
a  static  system  designed  to  avoid  the  continuity 
dilemma.     Neither  of  these  alternatives  is  parti- 
cularly attractive.     In  the  latter  case,  most  of 
the  alternatives  create  as  many  problems  as  they 
solve.     In  the  former  case,  if  we  are  going  to 
develop  a  set  of  continuity  rules,  then  there  is 
little  need  for  a  hybrid  definition. 

Attribute-type  definitions  are  drawn  from  the 
work  done  on  the  Panel   Study  of  Income  Dynamics 
(PSID).     The  goal    in  these  definitions  is  some- 
what different  than  in  the  previous  discussion. 
Rather  than  attempting  a  longitudinal    defini- 
tion of  households,  this  system  calculates  a 
series  of  cross-sectional   households  at  some 
smaller  time  interval,  and  then  ascribes  the 
characteristics  of  the  household  to  each  in- 
dividual.    Measures  for  some  longer  time  In- 
terval  are  then  calculated  by  aggregating  the 
series  of  point  estimates  across  each  individual 
to  represent  that  person's  household  experience 
during  the  period.     This  system  will  yield  the 
number  of  persons  who  lived  in  "households"  with 
a  monthly  income  of  $1000  to  $1500  during  the 
year;   however,  it  is  more  difficult  to  derive 
the  number  of  households  with  an  annual   household 
income  of  $12,000  to  $18,000.      In  fact,  without 
an  additional   set  of  assumptions,  this  system 
does  not  produce  an  accounting  of  households 
across  time.     In  order  to  develop  household 
statistics  within  the  attribute  system  1t  is 
necessary  to  assume,   for  example,  that  the  house- 
holder at  the  end  of  the  period  will   represent 
the  household  experience.     Then  the  household 
attributes  ascribed  to  that  person  are  aggregated 
to  produce  household  characteristics.     Those 
aggregrated  attributes  represent  the  house- 
holder's experience  during  the  period,  but  not 
necessarily  the  experience  of  the  other  persons 
in  the  household  at  that  time.     As  can  be  seen, 
an  assumption  such  as  this  contains  many  of  the 
weaknesses  of  using  a  static  or  cross-sectional 
definition  of  households,  with  few  obvious 
advantages  at  the  household  level. 

In  summary,  there  are  four  types  of  household 
definitions  which  have  been  proposed  for  use 
with  longitudinal   surveys:     cross-sectional; 
dynamic;  cross-sectional /dynamic  hybrids;  and 
the  attribute  system.     The  cross-sectional 


approach  is  clearly  Inappropriate  since  it 
Ignores  the  dynamic  nature  of  the  data.     The 
attribute  system  Incorporates  the  dynamic  aspects 
of  the  data  but  dodges  the  issue  of  developing 
continuity  rules  for  households.     Consequently, 
this  system,  by  ignoring  the  social   structure  of 
households,  produces  many  of  the  same  problems 
raised  1n  criticism  of  the  CPS  measure  of  annual 
household  income.     It  1s  clear  that  a  dynamic 
definition  1s  the  most  desirable  alternative,  but 
agreement  on  just  how  that  definition  ought  to 
be  formed  1s  elusive. 

In  the  following  section,   I  will   discuss 
dynamic  definitions  of  households  and  present 
three  sets  of  continuity  rules.     The  first, 
proposed  by  Norton  (1982),  defines  change,  rather 
than  continuity,  as  movement  between  major  types 
of  households.     The  second  is  a  reciprocal 
majority  rule  system  developed  by  Dicker  and 
Casady  (1982).     The  third,  developed  by  Siegel 
(1982),  sets  out  a  set  of  demographic  principles 
to  which  continuity  rules  should  conform  and 
then  develops  a  continuity  rule  within  that 
framework . 

A  Dynamic  Definition  of  Households 

A  dynamic  definition  of  households  is  much 
easier  to  describe  than  to  execute.     Dynamic 
household  definitions  do  little  more  than  acknow- 
ledge change  in  household  composition  or  type 
and  determine  that  the  change  must  be  incorpo- 
rated.     In  other  words,  dynamic  definitions 
acknowledge  a  set  of  accounts  and  to  some  extent 
set  up  the  framework  for  those  accounts,  but 
usually  do  not  explicate  the  principles  by  which 
the  ledger  1s  filled. 

For  SIPP,  1t  is  suggested  that  we  tabulate  the 
changes  that  occur  in  household  composition  and 
type  during  the  period  covered  by  a  panel .     That 
1s  to  say,  we  must  acknowledge  that  households 
change,  are  formed,  dissolve,  and  sometimes 
stay  the  same.      In  the  simplest  form  then,  our 
accounts  will   record  the  formation  and  dis- 
solution of  households  from  our  original   sample 
as  well   as  changes  in  size  and  type.     These 
tabulations  of  change  will   use  as  often  as 
possible  the  standard  descriptors,  such  as  family 
and  nonfamily  households,  and  the  categories 
associated  with  those  types.     Our  dynamic  defin- 
ition begins  with  the  static  CPS  definition 
defined  at  the  beginning  of  the  panel   and  then 
traces  the  changes  that  occur  to  those  households 
across  the  duration  of  the  panel.     Norton  (1982) 
suggests  that  we  define  a  household  as  changed 
when  its  membership  changes  in  such  a  manner  as 
to  classify  it  as  a  different  type  of  house- 
hold.    He  proposes  that  we  acknowledge  change 
between  the  following  types  of  households:   I) 
married-couple  household;   2)  male  family-house- 
hold;  3)   female  family-household;  4)  male  non- 
family-household;  and  5)   female  nonfamily-house- 
hold.     Thus  any  change  which  results  in  a  house- 
hold falling  into  a  different  category  results 
1n  the  dissolution  of  one  household  and  the 
formation  of  another.     To  Illustrate  this  system 
consider  a  husband-wife  household  which 
experiences  a  divorce.     Norton's  scheme  would 
consider  the  husband/wife  household  dissolved 
and  two  new  households  formed.     The  two  new 


households  would  be  family  households  1f  there 
were  children  present  and  nonfamlly  households 
otherwise. 

A  second  longitudinal   definition  has  been  pro- 
posed by  Dicker  and  Casady  (1982)   for  use  with 
the  National  Medical   Care  Utilization  and 
Expenditure  Survey  (NMCUES).     In  a  slight 
departure  from  the  definitions  discussed  here, 
Dicker  and  Casady  focus  on  families  rather  than 
households;  however,  that  does  not  pose  any 
serious  problems.     Like  others  who  have 
approached  this  problem,  Dicker  and  Casady  begin 
with  the  realization  that  there  is  not  consensus 
on  when  families  begin  or  cease  to  exist;  rather, 
such  transitions  are  in  part  a  function  of  the 
problem  being  investigated. 

The  NMCUES  model   for  defining  longitudinal 
families  requires  that  antecedents  and  descendent 
families  or,   in  their  terms,  predecessor  and 
successor  families  be  defined  reciprocally. 
That  is  to  say,  any  rules  defining  relationships 
across  time  must  be  applied  to  both  families 
simultaneously.     Dicker  and  Casady  next  demonst- 
rate that  when  applying  these  rules  you  wind  up 
with  links  between  a  number  of  households.     That 
is  to  say,  any  family  is  likely  to  have  more 
than  one  predecessor  and  more  than  one  successor 
family.     Thus,  the  problem  lies  in  defining 
which  of  the  possible  pairs  will   be  defined  as 
the  longitudinal   family.     As  with  most  longi- 
tudinal  definitions,  the  system  eventually 
reduces  the  decision  to  what  will   be  defined  as 
the  same  and  what  will   be  defined  as  chanqe. 

Dicker  and  Casady  chose  to  define  sameness  by 
a  majority  rule.     The  successor  family  which 
receives  the  majority  of  members  from  the  pre- 
decessor family  is  identified  as  the  "principal 
predecessor."     These  two  families  then  form  the 
linked  or  longitudinal   family  unit.     Finally,  in 
cases  where  families  split  evenly,  1t  should  be 
noted  that  the  NMCUES  model   does  not  define  a 
longitudinal   unit,  but  rather  dissolves  the 
predecessor  family  and  considers  all   successor 
families  as  newly  formed. 

Five  rules  of  relationships  focus  Siegel's 
(1982)   development  of  household  demography.     The 
first  two  state  that  a  household  can  have  only 
one  descendent  and  one  antecedent  that  is  identi- 
fied as  the  same  household.     That  is  to  say,  when 
a  household  splits,  only  one  of  the  subsequent 
households  can  be  Identified  as  "the  same"  house- 
hold.    The  third  and  fourth  rules  identify  house- 
holds which  are  not  the  same  as  some  precedinq 
or  succeeding  household.     Households  which  are 
not  the  same  as  any  antecedent  household  are 
newly  formed;  a  household  not  the  same  as  any 
descendent  household  is  dissolved.     The  final 
rule,  one  of  transitivity,  states  that  1f  A  1s 
the  same  as  B  and  B  1s  the  same  as  C,  then  A 
must  be  the  same  as  C.     All   that  remains  to  com- 
plete this  set  of  accounting  principles  1s  a 
definition  of  sameness.     The  rule  1s  offered  that 
two  households  separate  1n  time  and  having  the 
same  householder  are  the  same  household. 

Continuity  based  on  following  the  householder 
has  been  criticized  because  of  the  somewhat 
arbitrary  way  1n  which  the  householder  1s  defined, 
and  because  1t  creates  what  some  consider  un- 
reasonable change  within  a  continuing  household. 


The  most  frequently  cited  example  of  such  change 
1s  following  the  male  after  a  divorce  when  the 
children  remain  with  the  female.     An  alternative 
to  Siegel's  householder  rule  which  1s  consistent 
with  his  demography  of  households  1s  to  follow 
the  principal  person.     The  principal  person  is 
the  female  1n  the  married-couple  household  and 
the  householder  1n  all  other  households:  this  is 
the  concept  used  in  developing  household  weights 
1n  CPS.     By  following  the  principal  person,  we 
alleviate  the  problem  cited  above.     Of  course 
the  problem  now  occurs  when  the  children  stay 
with  the  male  following  a  divorce,  a  much  less 
frequent  event. 

FIGURE  1 
Norton  Siege!  Dicker 

Time  1  ABcd*  ABcd*  ABcd* 

Time  2  A*  Bed*  A  Bed*  A*  Bed 

Time  3  Ac*  Bd  Ac  Bd  Ac  Bd 

Time  4  Acd  B*  Acd  B  Acd  B 

♦New  household 

Let  us  consider  briefly  the  strengths  and 
weaknesses  of  these  three  systems  focusing  on 
two  Issues:     1)  the  number  of  households  created 
across  time,  and  2)  the  extent  to  which  the 
definition  promotes  or  discourages  longitudinal 
analysis.     Norton's  system  comes  the  closest  to 
maximizing  change  and,  as  a  result,  creates  more 
households  than  the  others.     Consider  the  divorce 
example  from  above,  but  two  children  remain  with 
the  female.     Both  Siegel   and  Dicker  and  Casady 
would  produce  a  total   of  two  households  resulting 
from  the  divorce.     Norton's  system  produces  three 
households:     1)  the  original  married-couple 
household;  2)   a  male  nonfamily-household;  and 
3)  a  female  family-household.     Let  us  continue 
following  these  people  and  assume  that  the  child- 
ren leave  the  female  one  at  a  time  and  join  the 
male  (see  figure  1).     In  Norton's  scheme,  the 
first  move  by  a  child  would  produce  the  dis- 
solution of  the  male  nonfamilyhousehold  and  the 
creation  of  a  male  family-household.     Our  longi- 
tudinal count  of  households  now  stands  at  four. 
Neither  Dicker  and  Casady  nor  Siegel  would  pro- 
duce new  households  as  a  result  of  the  children 
moving,     moving.     When  the  second  child  moves, 
the  female  family-household  is  dissolved  and  a 
female  nonfamily-household  is  created.     The 
male  family-household  remains  unchanged.     Over 
these  four  observations,  Norton's  system  produces 
five  households;  both  Siegel   and  Dicker  and 
Casady  produce  only  two  by  allowing  the  contin- 
uation of  a  household  across  these  changes.     Let 
us  then  look  at  those  continuing  households.     The 
continuous  household  for  Siegel's  householder 
rule  starts  as  a  four-member  married-couple 
household,  dwindles  to  one  member—the  male,  and 
increases  to  two  with  the  addition  of  one  child 
and  then  to  three  with  the  addition  of  the  second 
child.     On  the  other  hand,  Dicker  and  Casady' s 
continuous  household  begins  as  the  four-person 
marrledcouple  household  and  1s  transformed  by 
the  divorce  to  a  three-person  female-headed 
household,  then  to  a  two-person  and,  subsequently, 
a  one-person  nonfamily-household.     It  should  be 
noted  that  these  two  continuous  households 
follow  opposite  courses  after  the  divorce.     The 
continuous  household  under  the  principal -person 
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rule  would  be  identical   to  Dicker  and  Casady's 
continuous  household. 

We  should  stop  at  this  point  and  examine  yet 
realistic  example  of  household  change.     First, 
1f  we  are  interested  in  counting  households 
(the  number  or  percent  of  households  1n  poverty 
during  the  year,  for  example),  then  a  con- 
tinuity system  such  as  Norton's,  which  allows 
for  continuity  in  only  the  most  trivial   cases, 
creates  a  much  larger  number  of  households. 
Suppose  the  female  half  of  our  mythical   house- 
hold was  1n  poverty  after  the  divorce.     By 
Norton's  count,  during  that  year  we  would  have 
20  percent  of  the  households  in  poverty. 
Dicker  and  Casady  and  Siege!  would  show  50 
percent  of  their  households  in  poverty.     A 
second  observation  to  be  made  here  is  that, 
regardless  of  what  sort  of  continuity  rule  we 
adopt,  we  will   observe  households  which  contain 
a  wide  variety  of  change.     The  question  we  must 
ask  is  whether  we  accept  that  households  under- 
go such  change  and  remain  Intact. 

As  noted  above,  each  of  the  three  systems  has 
its  constituency  and  its  detractors.     Siegel's 
system  is  criticized  because  of  the  disjunctures 
that  can  occur  following  a  divorce.     For  example, 
the  continuous  household  will   follow  a  male 
householder  who  divorces  his  wife  even  though 
the  wife  and  children  remain  in  the  housing 
unit  as  a  group.     Similarly,  Norton's  scheme  is 
criticized  because  of  the  lack  of  attention 
paid  to  continuity.     Dicker  and  Casady  are 
criticized  by  the  mechanical   nature  of  the 
majority  rule.     Why,  it  is  asked,  should  one 
person  make  all   the  difference  in  whether  a 
family  is  designated  new  or  continuous? 

None  of  the  definitions  of  longitudinal   house- 
holds offered  in  the  literature  has  proved 
viable.     However,  in  the  process  of  discussing 
this  issue  with  several    demographers  and 
economists,  it  became  clear  that  any  definition 
which  labels  a  transition  from  a  family  household 
to  a  nonfamily  household  as  continuous  causes 
problems;  although  there  are  some  cases  where 
the  transition  from  nonfamily  to  family  house- 
hold occurs  within  a  continuous  unit.     The  most 
obvious  of  these  is  the  marriage  of  two  persons 
who  have  been  living  together.     Drawing  on  that 
experience,  we  determined  that  we  should  develop 
a  longitudinal   definition  of  families  separate 
from  that  for  nonfamily  households. 

We  begin  with  the  CPS  definition  of  a  family 
as  two  or  more  persons,  one  of  whom  is  the  house- 
holder, related  by  birth,  marriage,  or  adoption, 
and  residing  together.     To  make  this  cross- 
sectional   definition  dynamic,  we  must  add  the 
time  dimension  or  develop  a  continuity  rule. 
Thus,  a  longitudinal   family  1s  defined  as  two  or 
more  related  persons,  at  least  one  of  whom  is  the 
householder  or  spouse  of  the  householder,  who  had 
the  same  household  experience  over  two  or  more 
consecutive  months.     We  further  stipulate  that 
no  more  than  one  core  family  unit  with  children 
can  continue  from  a  previous-month  family.     Three 
levels  of  criteria  are  offered  to  distinguish 
cases  where  both  parents  and  children  split  Into 
two  or  more  households.     The  first-level   cri- 
terion, for  continuity,  is  that  the  family  with 
the  most  child-months  1s  identified  as  continuous. 


The  second  level,  to  distinguish  between  families 
with  the  same  number  of  child-months,  1s  the 
family  with  the  most  family-months.      In  the 
third  level,  1f  two  potential   continuing  units 
tie  on  both  of  the  above  criteria,  then  the 
continuing  unit  will   be  assigned  randomly.     Two 
elements  have  been  added  to  the  CPS  definition: 
1)   the  time  dimension,  and  2)  the  Inclusion  of 
spouse  as  part  of  the  continuity  criteria. 

Let  us  examine  this  definition  more  carefully. 
Consider  again  our  four-step  example  of  divorce 
and  then  the  movement  of  two  children  one  at  a 
time  from  one  parent  to  another.       Following  the 
separation,  the  Bed  family  would  be  the  continu- 
ing family  because  1t  contains  two  or  more 
members  of  the  Initial   family,  one  of  whom  is 
the  householder  or  spouse.     The  A  household 
would  be  new  because  of  the  transition  from 
family  to  nonfamily  status.     Following  the  move- 
ment of  the  first  child,  c,  the  Ac  family  is 
considered  newly  formed  because  of  the  transition 
from  nonfamily  to  family  status.     Finally,  the 
movement  of  d  from  the  Bd  family  to "the  Ac  family 
results  1n  a  new  nonfamily  household,  B.     Using 
the  notation  from  figure  1,  we  have: 


1.  ABcd* 

2.  A*     Bed 


Ac* 
Acd 


Next,  we  must  confront  defining  continuity  for 
nonfamily  households.  A  nonfamily  household 
is  a  householder  living  with  nonrelatives  only. 
For  these  cases,  we  have  adopted  a  50-percent 
rule.  As  long  as  the  householder  and  50  percent 
or  more  of  the  household  is  the  same  at  two 
points  in  time,  the  household  is  considered  a 
continuous  household.  The  distinction  between 
this  and  the  majority  rule  1s  that,  rather  than 
creating  new  households  for  even  splits,  this 
rule  provides  for  continuity.  This  definition 
of  family  and  nonfamily  households  provides  the 
basic  parameters.  What  remains  is  to  develop 
a  set  of  programming  specifications  to  implement 
this  definition. 

Other  possible  longitudinal  units  or  groups 
exist  in  relation  to  federally  funded  support 
programs.  For  example,  food-stamp  units  and  AFDC 
units  are  defined  independently  of  the  household 
and,  in  fact,  households  may  contain  more  than 
one  of  these  units.  Longitudinal  units  for 
these  programs  will  be  defined  on  the  basis  of 
the  person  in  whose  name  the  program  application 
is  filed.  For  example,  in  a  husband-wife,  two- 
child  family,  the  male  is  defined  as  the  food- 
stamp  recipient.  If  he  leaves,  that  food-stamp 
unit  1s  dissolved;  a  new  one  is  formed  if  the 
female  reapplies  and  is  found  eligible. 

Perspectives  on  Household  Characteristics 

In  this  section,  I  will  address  the  uses  of 
this  longitudinal  definition  of  households  and 
argue  that  we  need  to  tabulate  household  data 
from  SIPP  using  at  least  two  types  of  longi- 
tudinal definitions.  The  need  for  two  types  of 
definitions  1s  a  function  of  the  kind  of  house- 
hold Information  needed. 

I  am  arguing  that  we  need  to  use  both  a 
dynamic  longitudinal  and  an  attribute-type  house- 
hold definition,  because  we  are  interested  in 
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both  the  experience  of  households  and  of  In- 
dividuals 1n  households.  There  are  some 
characteristics  like  annual  household  income 
which  we  need  to  examine  both  as  a  characteristic 
of  the  household  and  of  the  individual.  This  1s 
only  to  say  that  there  are  multiple  meanings 
associated  with  the  concept  of  income.  In  CPS, 
where  we  have  only  one  way  of  obtaining  Income 
data,  we  attach  all  of  those  meanings  to  that 
single  measure.  SIPP  allows  us  to  decompose  that 
measure  and  look  at  the  components  more  carefully 

To  summarize,  I  have  argued  that  to  fully 
appreciate  the  household  dynamics  we  observe  in 
SIPP  and  to  portray  that  activity  over  a  year, 
we  should  provide  two  types  of  tabulations.  The 
first  tabulates  household  characteristics  using  a 
lonqitudinal  definition  and  examines  how  changes 
in  some  characteristics  result  1n  changes  in 
others.  The  second  type  of  tabulation  examines 
how  household  characteristics  affect  individuals 
across  time. 
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LIFETIME  WORK  EXPERIENCE  AND  ITS  EFFECTS  ON  EARNINGS:   DATA  FROM  THE  INCOME  SURVEY  DEVELOPMENT  PROGRAM 

John  M.  McNeil,  U.S.  Bureau  of  the  Census  and  Joseph  J.  Salvo,  NY  Department  of  City  Planning 

earnings  was  the  lack  of  data  on  lifetime  work 
experience.  More  recently,  however,  a  number 
of  studies  have  been  published  which  exploit 
the  important  data  which  has  been  made  avail- 
able from  the  National  Longitudinal  Surveys  of 
Labor  Market  Experience  (NLS)  and  the  Michigan 
Panel  Survey  of  Income  Dynamics  (PSID). 

Suter  and  Miller  (1973)  were  among  the  first 
to  analyze  the  retrospective  work  history  data 
from  the  NLS.  They  studied  a  cohort  of  women 
who  were  30  to  44  years  of  age  in  1967.  Work 
experience  was  based  on  a  question  which  asked 
about  the  total  number  of  years  in  which  the 
person  3  had  worked  at  least  6  months.  Suter  and 
Miller  concluded  that  there  was  a  close  associ- 
ation between  earnings  and  length  of  work  experi- 
ence among  this  cohort  of  women. 

Mincer  and  Polachek  (1974)  extended  the 
analysis  of  the  NLS  retrospective  data.  They 
specified  two  reasons  why  discontinuous  work 
history  patterns  might  lead  to  lower  earnings. 
First,  interruptions  in  market  work  lead  to 
lower  levels  of  accummulated  human  capital. 
Second,  interruptions  cause  a  depreciation  in 
existing  human  capital.  That  is,  time  spent 
away  from  market  work  has  a  cost  beyond  the 
effect  of  foregone  experience.   In  their 
analysis,  Mincer  and  Polachek  found  that  the 
amount  of  time  spent  at  home  had  a  negative 
impact  on  earnings  even  when  experience  was 
also  included  in  the  earnings  equation.  They 
concluded  from  their  analysis  that  a  depreci- 
ation effect  does,  in  fact,  exist. 

This  finding  was  challenged  by  Sandell  and 
Shapiro  (1978)  on  the  grounds  that  the  NLS  data 
used  by  Mincer  and  Polachek  were  subject  to 
various  coding  errors.  They  replicated  certain 
of  the  Mincer-Polachek  research  using  a  correc- 
ted NLS  file  and  concluded  that  the  original 
study  had  overestimated  the  depreciation  effect. 

Corcoran  (1979)  conducted  an  analysis  of  the 
effect  of  experience  and  interruptions  on  earn- 
ings using  retrospective  data  from  the  PSID. 
One  of  the  major  advantages  of  the  PSID  data 
set  was  that  the  sample,  in  contrast  to  the 
NLS  samples,  was  representative  of  the  female 
population  18  to  64  years  of  age.  Corcoran 
found  yery   little  evidence  of  a  depreciation 
effect.  There  was  no  effect  for  White  women 
and  only  a  minor  effect  for  Black  women.  Cor- 
coran also  argued  that  restricting  the  analysis 
group  to  women  30  to  44  years  of  age  is  likely 
to  overestimate  depreciation  because  many 
women  in  this  group  have  recently  reentered 
the  labor  market  and  are  likely  to  be  affected 
by  misinformation  about  job  opportunities. 

More  recently,  Mincer  and  Ofek  (1982)  used 
NLS  data  for  30-  to  44-year-old  married  women 
to  reaffirm  the  depreciation  hypothesis.  In 
an  analysis  of  longitudinal  (rather  than 
retrospective)  data  from  the  NLS,  they  found 
that  reentry  wage  rates  were  lower  than  wage 
rates  at  the  time  of  labor  force  withdrawal. 
Furthermore,  longer  interruptions  carried 
greater  wage  penalties.  They  also  found,  how- 
ever, that  wage  rates  tended  to  grow  rapidly 
upon  return  to  work.  The  observed  amount  of 


The  extent  to  which  persons  remain  attached 
to  the  labor  force  over  the  course  of  their 
working-age  years  has  important  economic  and 
social   implications.  Differences  in  labor 
force  attachment  between  men  and  women  has 
been  cited  as  one  major  reason  why  women  earn 
less  than  men.  This  study  presents  data  from 
the  1979  Income  Survey  Development  Program 
(ISDP)  on  lifetime  work  interruptions  and 
examines  the  relationship  between  work  inter- 
ruptions and  earnings.  Descriptive  data  show- 
ing the  extent  to  which  men  and  women  have 
experienced  work  interruptions  are  presented, 
followed  by  an  analysis  of  the  impact  of  work 
interruptions  on  earnings.  The  study  concludes 
that  work  interruptions  explain  only  a  small 
proportion  of  the  earnings  differential  between 
men  and  women. 

The  1979  ISDP  was  a  panel  survey  of  approxi- 
mately 9,000  households  that  were  visited  at 
3-month  intervals  over  a  period  of  a  year  and 
a  half  beginning  in  February  1979.  The  survey, 
part  of  the  development  stage  of  the  new  income 
survey  called  the  Survey  of  Income  and  Program 
Participation  (SIPP),  was  a  joint  effort  of 
the  U.S.  Department  of  Health  and  Human  Ser- 
vices and  the  U.S.  Bureau  of  the  Census.  The 
third  wave  questionnaire  contained  a  section 
on  personal  history  and  within  that  section 
were  questions  on  lifetime  work  interruptions. 
The  questions  (reproduced  in  Figure  A)  asked 
whether  the  person  had  ever  been  away  from 
work  for  6  months  or  longer  for  each  of  three 
reasons:  (1)  because  he  or  she  was  not  able  to 
find  work,  (2)  because  he  or  she  was  taking 
care  of  home  or  family,  and  (3)  because  he  or 
she  was  ill  or  disabled.  Beginning  and  ending 
dates  were  recorded  for  each  interruption.  A 
maximum  of  four  interruption  periods  were 
identified  for  each  of  the  three  possible 
reasons  for  interrupting. 

A  major  reason  for  the  interest  in  data  on 
lifetime  work  experience  is  the  desire  to  use 
such  data  in  the  analysis  of  male-female  earn- 
ings differentials.  The  tenets  of  human  capi- 
tal research  have  traditionally  stressed  the 
importance  of  work  experience  patterns  as  a 
determinant  of  earnings.  The  descriptive  data 
presented  in  the  first  part  of  this  report  con- 
firm that  the  lifetime  labor  force  attachment 
of  women  is  weaker  than  that  of  men.  Because 
of  interruptions  for  familial  reasons,  women 
have  a  much  higher  overall  rate  of  work  inter- 
ruptions than  men  and  they  spend  a  much  higher 
proportion  of  their  potential  work  years  out 
of  the  labor  force.  Such  findings  have  led  at 
least  some  social  scientists  to  posit  that 
traditional  familial  responsibilities  are  one 
major  reason  why  women  earn  less  than  men. 
This  section  will  describe  selected  studies  of 
the  relationship  between  work  interruptions 
and  earnings  and  will  present  an  analysis 
based  on  the  1979  ISDP  data. 

Previous  Research 

A  major  constraint  in  early  efforts  to  ex- 
amine the  relationship  between  experience  and 


depreciation,  they  concluded,  is  dependent 
upon  the  length  of  the  interruption  and  the 
length  of  time  spent  back  in  the  labor  force. 

ISDP  Data 

The  effect  of  work  interruptions  on  earnings 
was  examined  by  using  the  data  described  ear- 
lier to  construct  variables  representing  inter- 
ruptions and  experience.  These  variables  were 
included  in  regressions  which  related  hourly 
earnings  to  a  set  of  explanatory  variables. 
The  universe  for  this  part  of  the  study  consi- 
sted of  all  persons  21  to  64  years  of  age  with 
wage  and  salary  income  during  the  quarter  pre- 
ceding the  interview.   Separate  regressions 
were  run  for  men  and  women,  with  the  log  of 
hourly  earnings  as  the  dependent  variable.1 
The  interruption  and  experience  variables 
used  in  the  regressions  include  the  following: 
UNEMP  =  1  if  person  had  ever  experienced 

an  interruption  due  to  an  inability 
to  find  a  job;   0  otherwise. 
DISAB  =  1  if  person  had  ever  experienced 
an  interruption  due  to  illness  or 
disability;  0  otherwise. 
TIME-AWAY  =  Duration  of  all  interruptions2  as 

proportion  of  potential  work  years.3 
EXPER  =  Number  of  potential  work  years  minus 
minus  duration  of  all  interruptions.4 
EXPERSQ  =  The  square  of  EXPER 

FT  =  1  if  the  jobs  the  person  has  worked 
at  have  usually  or  always  been 
full-time  jobs;   0  otherwise. 
The  interruption  variables  were  specified  in 
the  above  form  because  it  was  hypothesized  that 
earnings  could  be  affected  by  the  existence  of 
an  interruption  as  well  as  by  the  length  of  an 
interruption.  Because  interruptions  due  to  un- 
employment or  disability  had  a  relatively  small 
effect  on  the  proportion  of  potential  work  years 
spent  away  from  work,  they  were  entered  as 
zero-one  dummy  variables.  Because  interruptions 
for  familial  reasons  had  a  very  strong  effect 
on  the  amount  of  time  spent  away  from  work, 
they  were  allowed  to  enter  the  equation  through 
their  effect  on  the  TIME-AWAY  variable.  The 
general  experience  variable,  EXPER,  was  entered 
in  its  own  form  as  well  as  in  its  squared  form, 
EXPERSQ.  The  inclusion  of  the  squared  form 
was  intended  to  capture  the  nonlinear  effect 
of  experience  on  earnings.  (The  returns  to  ex- 
perience tend  to  flatten  after  some  point.) 

The  education  variables  included  in  the  re- 
gression were  designed  to  take  advantage  of 
the  ISDP  personal  history  questions  on  highest 
degree  obtained,  vocational  training,  and  types 
of  courses  taken  in  high  school.  They  included 
the  following: 

EDUC1  =  With  an  advanced  degree 
EDUC2  =  With  a  bachelors'  degree 
EDUC3   =  High  school  graduate  (reference 

>      group) 
EDUC4   =  Not  a  high  school  graduate,  with  a 
vocational  training  certificate 
EDUC5   =  Not  a  high  school  graduate,  no  voca- 
tional training   certificate 
COURSES  =  Number  of  selected  academic  courses 
completed  in  high  school 
Finally,  a  set  of  variables  representing  mari- 
tal history  were  included: 
MARR1   =  Married,  no  marital  disruption 


(reference  group) 
MIVRR2  =  With  a  marital  disruption  (ever 
widowed,  divorced  or  separated) 
MIVRR3  =  Never  married 

The  basic  results  of  the  survey  are  pre- 
sented in  tables  A  and  B,  the  means  for  all 
variables  are  shown  in  table  C  and  the 
regression  results  are  shown  in  table  D. 
Results  are  shown  for  White  women  and  men  as 
well  as  for  all  women  and  men  in  order  to 
facilitate  comparisons  with  previous  studies. 
Results  are  also  shown  for  men  and  women  30 
years  of  age  and  over  with  no  familial  inter- 
ruptions as  an  alternative  method  of  examining 
the  influence  of  work  interruptions. 

The  large  differences  between  the  sexes  in 
the  degree  of  work  attachment  are  highly 
visible  in  table  C.  Men  had,  on  the  average, 
about  19  years  of  work  experience  and  had 
spent  only  about  2  percent  of  their  potential 
work  years  away  from  work.  Women,  on  the  other 
hand,  had  14  years  of  work  experience  and  had 
spent  about  20  percent  of  their  potential  work 
years  away  from  work.  There  were  small  or 
insignificant  differences  between  men  and  women 
in  the  mean  values  of  the  other  experience  and 
interruption  variables  and  in  the  mean  values 
of  most  of  the  education  and  marital  history 
variables. 

Men,  however,  were  more  likely  than  women  to 
have  received  advanced  degrees  and  a  larger  pro- 
portion of  women  than  men  experienced.  The 
average  hour  earnings  of  all  women  was  $4.38, 
about  63  percent  as  high  as  the  average  hourly 
earnings  of  $6.92  for  all  men. 

The  regression  results  confirm  the  importance 
of  experience  as  a  determinant  of  earnings.  The 
general  experience  variables  EXPER  and  EXPERSQ 
are  highly  significant  for  both  men  and  women 
and  are  important  relative  to  other  variables 
in  the  determination  of  hourly  earnings. 
Attachment  to  fulltime  work  also  has  a  signifi- 
cant effect  on  earnings.  The  coefficients  of 
the  experience  variables  show  that  the  returns 
to  experience  are  greater  for  men  than  for 
women . 

The  interruption  variables,  in  general,  have 
a  negative  effect  on  earnings,  but  the  effect 
is  not  particularly  strong  or  consistent. 
The  coefficient  of  TIME -AWAY  is  significant 
for  both  men  and  women  in  the  equation  for 
persons  of  all  races,  but  is  significant  for 
women  only  in  the  equations  for  White  men  and 
women.  Interruptions  due  to  illness  or  disa- 
bility (DISAB)  have  a  significant  negative 
effect  on  earnings  in  five  of  the  equations, 
but  interruptions  due  to  inability  to  find 
work  have  a  significant  negative  effect  in 
only  two  of  the  equations. 

That  an  earnings  equation  contains  both 
experience  and  interruption  variables  that 
are  significant  is  evidence  that  a  depreciation 
effect  does  exist.   In  the  equation  for  men  of 
all  races,  the  experience  variables  EXPER, 
EXPERSQ,  and  FT  are  highly  significant  and  the 
interruption  variables  UNEMP  and  DISAB  are 
also  significant.  In  the  equations  for  women 
of  all  races,  the  experience  variables  and 
the  interruption  variable  TIME-AWAY  have 
highly  significant  effects  on  earnings.  The 
conclusion  is  that  a  depreciation  effect  does 


exist  and  information  about  work  interruptions 
will  improve  those  models  which  attempt  to 
explain  earnings. 

The  coefficents  of  the  education  and  marital 
history  variables  are  in  line  with  expectations, 
but  two  findings  should  be  noted.  First,  the 
coefficient  of  EDUC4  for  men  is  less  negative 
than  the  coefficient  of  EDUC5.  This  finding 
suggests  that  a  vocational  training  certificate 
has  a  positive  effect  on  earnings.  Second, 
the  coefficient  of  COURSES  is  highly  signifi- 
cant even  though  other  measures  of  educational 
attainment  are  also  present  in  the  equation. 
That  is,  for  the  purpose  of  explaining  earn- 
ings, it  is  important  to  know  about  the  types 
of  courses  taken  in  high  school  even  when  we 
already  have  information  about  years  of  school 
completed  and  highest  degree  obtained. 

Table  C  shows  that  the  mean  earnings  of 
women  are   only  about  62  percent  of  the  earnings 
of  men  even  when  the  group  under  study  is  com- 
prised of  persons  30  years  of  age  and  over 
with  no  familial   interruptions.  This   dif- 
ferential exists  even  though  women  in  this 
universe  have  approximately  the  same  mean 
years  of  experience  as  men.  Table  D  shows  why 
the  large  differential  exists  even  when  the 
mean  values  of  experience  are  so  close.  Among 
the  men  in  this  group,  the  coefficient  of  EXPER 
is  highly  significant,  but  among  women,  the 
coefficient  is  not  significant. 

In  general,  standardized  regression  co- 
efficients reveal  that  the  work  interruption 
variables  are  less  important  than  either  the 
general  experience  or  education  variables  as 
determinants  of  earnings.  This  holds  true  for 
both  men  and  women.  So,  while  the  work  inter- 
ruption variables  do  show  that  a  depreciation 
effect  exists,  general  work  experience  and 
education  are  more  critical  determinants  of 
earnings. 

The  earnings  equations  which  have  been  devel- 
oped for  this  report  can  be  used  to  examine 
the  extent  to  which  differences  in  work  history 
(experience  and  interruptions)  are  related  to 
the  earnings  gap  between  men  and  women.  That 
is,  given  the  coefficients  of  their  own  equa- 
tion, what  would  the  earnings  of  women  be  if 
they  had  the  same  mean  values  as  men  for  the 
variables  measuring  experience  and  interrup- 
tions. Table  E  shows  that  the  effect  of 
assigning  to  women  the  mean  experience  and 
interruption  values  of  men  is  to  reduce  the 
earnings  gap  by  only  12  percent. 

Problems  in  retrospective  measures  of  work  ex- 
perience and  work  interruptions 

One  of  the  goals  of  SIPP  is  to  develop  a 
data  base  that  can  be  used  to  investigate  the 
relationships  among  income,  program  participa- 
tion, and  personal  history  including  work 
history.  A  certain  amount  of  work  history  will 
be  obtained  as  persons  are  followed  during  their 
time  in  the  panel,  but  persons  spend  only  2  1/2 
years  in  the  panel.  Some  work  history  data  may 
be  obtained  by  matching  survey  records  with 
Social  Security  earnings  records,  but  matching 
records  takes  time  and  the  amount  of  work  his- 
tory data  that  can  be  obtained  from  Social 
Security  records  are  limited.  Until  1978,  the 
Social  Security  record  contained  information 


on  earnings  during  a  quarter  which  were  subject 
to  the  Social  Security  tax.  Therefore,  if  a 
person's  earnings  met  the  Social  Security  tax 
limit  in  the  first  quarter  of  the  year,  no  ear- 
nings data  would  appear  for  the  remaining 
quarters.  Since  1978,  the  record  contains  an- 
nual data  on  covered  and  noncovered  earnings. 

When  the  personal  history  supplement  was 
designed  for  the  third  wave  of  the  1979  ISDP, 
the  problem  was  to  develop  a  set  of  questions 
that  could  be  completed  in  2  or  3  minutes  and 
that  would  provide  an  indication  of  lifetime 
work  attachment.  The  approach  adopted  was  to 
attempt  to  identify  periods  lasting  6  months 
or  longer  when  the  person  did  not  work.  The 
ISDP  work  history  questions  are  reproduced  in 
Figure  A. 

There  are  obviously  very  great  problems 
in  trying  to  measure  lifetime  work  experience 
in  a  brief  set  of  questions.  The  data  from 
these  questions  do  have  a  considerable  amount 
of  face  validity,  but  it  seems  reasonable  to 
suppose  that  the  data  are  also  characterized 
by  response  problems.  One  way  of  identifying 
possible  problem  areas  is  to  cross-classify 
current  age  by  age  at  first  reason-specific 
interruption.   If  there  is  no  significant 
memory  loss,  then  one  would  expect  that  the 
proportion  of  persons  reporting  that  a  first 
reason-specific  interruption  took  place  while 
they  were  in  a  particular  age  interval  would 
be  independent  of  their  current  age.  Since  a 
cross-classification  shows  that  memory  loss 
is  a  significant  factor  in  the  reporting  of 
first  interruptions  due  to  an  inability  to 
find  work.  Persons  21  to  29  were  much  more 
likely  than  older  persons  to  report  that  such 
an  interruption  occurred  before  their  25th 
birthday.  There  is  some  evidence  of  memory 
loss  in  the  reporting  of  first-time  interrup- 
tions due  to  disability,  but  not  to  the  same 
degree  as  interruptions  due  to  an  inability  to 
find  work.  There  is  no  evidence  of  memory 
loss  in  the  reporting  of  first-time  inter- 
ruptions of  female  interruptions  for  familial 
reasons.  (The  above  conclusions  are  based  on 
the  assumption  that  the  age  groups  had  similar 
experiences.) 

The  ISDP  results  were  taken  into  considera- 
tion when  the  time  came  to  design  the  SIPP  ques- 
tions on  work  history.   In  an  effort  to  reduce 
the  problem  of  memory  loss,  respondents  were 
asked  to  begin  with  the  earliest  6  month  inter- 
ruption and  work  forward.  The  sequence  also 
attempted  to  determine  the  total  number  of 
interruptions,  then,  for  each  period  of  inter- 
ruption, determine  the  duration  of  and  reason 
for  the  interruption.  Because  the  SIPP  se- 
quence asks  for  the  total  number  of  interrup- 
tions and  contains  a  "Don't  Know"  box  for  dura- 
tion of  interruption,  we  expect  to  be  able  to 
do  a  better  job  of  imputing  for  item  nonre- 
sponse. 

Future  work 

We  have  finished  the  field  collection  oper- 
ation for  the  third  wave  of  SIPP,  the  wave  con- 
taining the  work  history  data  and  are  in  the 
process  of  designing  processing  specifications 
so  that  a  file  can  be  prepared  which  contains 
no  item  nonresponse. 
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There  are  some  differences  between  the 
work  history  data  collected  in  ISDP  and  the 
SIPP  work  history  data.  First,  the  SIPP  sample 
size  is  20,000  households,  about  twice  the  size 
of  ISDP.  Second,  the  SIPP  data  on  beginning 
and  ending  dates  of  interruptions  should  be 
more  complete  than  similar  data  from  ISDP. 
Third,  unlike  the  ISDP  third  wave  file,  the 
SIPP  file  will  have  data  on  job  and  occupation 
tenure.  The  SIPP  file  should  be  somewhat  more 
useful  than  the  ISDP  file,  and  should  allow 
users  to  expand  the  analysis  by  considering 
other  variables  (e.g.  job  and  occupation  ten- 
ure) and  by  considering  the  timing  of  work 
interruptions  not  just  their  existence  and  du- 
ration. 

FOOTNOTES 
*  Hourly  earnings  were  calculated  by  dividing 
total  earnings  for  the  3-month  period  by  the 
total  number  of  hours  worked. 

2  A  maximum  of  four  interruption  periods  could 
be  identified  for  each  of  three  possible 
reasons  for  interrupting. 

3  Potential  work  years  were  defined  as  age 
minus  years  of  school  completed  minus  6. 

4  The  ISDP  data  on  employer-specific  or  job- 
specific  measures  of  work  experience  (e.g., 
tenure  with  most  recent  employer/at  most 
recent  job)  were  collected  in  the  fifth 
wave  of  the  survey  and  were  not  available 
for  this  study. 
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Table  A.  Work  Interruption  History  by  Race,  Spanish  Origin,  ani 
Selected  Characteristics:  Males 


Characteristic 


Total 
(In  thousands 


Percent  with  one  or  more  Interruptions 
lasting  6  months  or  more  due  to— 


reasons 
surveyed 


Inability 

to  find 

•work 


Illness 

or 

disability 


Mean  percent  of 
potential  work 

years  spent  away 
from  work  for 

reasons  surveyed 


Males  21  to  64  years 
who  ever  worked  

RACE  AND  SPANISH 
ORIGIN* 

White  

Black  

Spanish  origin  

YEARS  OF  SCHOOL 
COMPLETED 

Less  than  12  

12  to  15 

16  and  over  

AGE  BY  YEARS  OF  SCHOOL 
COMPLETED 

21  to  29  years  

Less  than  12  

12  to  15  

16  and  over  

30  to  44  years  

Less  than  12  

12  to  15  

16  and  over  

45  to  64  years  

Less  than  12  , 

12  to  15  

16  and  over 


Professional,  technical 

or  managerial  

Sales  or  clerical  

Craftsmen  

Operatives  

Laborers  

Service  


49,381 
5,627 
3,220 


14,171 
29,761 
11,896 


16,048 
2,314 

10,104 
3,630 

19,106 
3,809 

10,278 
5,019 

20,674 
8,049 
9,378 
3,247 


15,040 
6,621 
12,825 
10,254 
5,832 
3,457 


24.2 
40.2 
34.9 


40.1 
24.7 
11.0 


20.5 
40.7 
20.8 
6.9 

23.4 
36.7 
25.2 
9.6 

11.9 
41.6 
28.5 
17.8 


14.7 
20.6 
28.8 
32.5 
37.9 
25.5 


15.2 
35.0 
22.7 


24.9 
17.3 
7.9 


18.0 
35.5 
18.5 
5.5 

16.2 
24.5 
17.8 
6.5 

17.7 
22.1 
15.6 
12.7 


10.2 
13.8 
18.7 
20.8 
27.6 
14.8 


2.1 

1.5 


1.5 
1.7 

.8 
3.0 


10.7 
10.7 
15.8 


20.3 
9.3 


3.4 
7.4 


8.5 
18.2 
8.0 
2.0 

18.2 
25.1 
17.1 
4.3 


13.6 
11.1 


2.3 
2.3 

2.9 
3.9 
5.6 
4.1 


0.1 
0.3 
0.3 


0.4 
0.2 
0.2 


0.2 
0.2 
0.2 


1/Persons  of  Spanish  origin  may  be  of  any  race* 
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Table  B.  Work  Interruption  History  by  Race,  Spanish  Origin,  and  Selected 
Characteristics:  Females 


Total 
(1n  thousands) 

Percent  with  one  or 
lasting  6  months 

more  Interruptions 
or  more  due  to- 

Mean  p 
potent 

years  s 
from  w 

reasons 

srcent  of 
al  work 
3ent  away 
Drk  for 

Characteristic 

All 
reasons 
surveyed 

Inability 

to  find 

work 

Family 
reasons 

Illness 

or 

disability 

surveyed 

Value 

Standard 
error 

Females  21  to  64  years 
who  ever  worked  

RACE  AND  SPANISH 
ORIGIN! 

White  

57,258 

49,812 
6,402 
3,014 

13,740 
34,805 
8,713 

16,804 
1,948 

11,650 
3,206 

19,445 
4,060 

12,366 
3,018 

21,011 
7,733 

10,789 
2,489 

11,723 
23,782 
8,447 
950 
10,543 

71.9 

73.0 
63.1 
75.0 

79.5 
73.3 
54.3 

53.1 
70.6 
56.8 
29.1 

77.5 
79.8 
79.8 
65.1 

81.7 
81.5 
83.7 
73.8 

61.0 
75.2 
74.8 
78.3 
74.2 

14.2 

12.4 
27.4 
23.6 

21.7 
12.7 
8.6 

17.0 
23.2 
18.4 
8.3 

12.3 
20.4 
9.9 
11.4 

13.8 
22.0 
9.8 
5.6 

9.5 
10.7 
22.2 
21.9 
19.7 

64.1 

66.8 
43.8 
62.4 

68.5 
66.3 
48.6 

42.5 
61.7 
44.9 
22.2 

72.3 
73.6 
75.3 
58.5 

73.8 
67.6 
79.0 
70.6 

55.4 
69.4 
62.9 
67,8 
63.4 

9.2 

8.3 
17.5 
12.9 

20.1 
6.6 
2.6 

3.5 
5.9 
4.1 
.1 

6.6 
12.3 
5.7 
2.9 

16.1 
27.8 
10.3 
5.4 

5.4 
5.9 
14.5 
15.1 
16.5 

30.9 

32.7 
17.6 
27.6 

33.5 
31.5 
24.2 

20.7 
30.9 
22.3 
8.9 

34.3 
34.2 
34.8 
32.2 

35.8 
33.7 
37.7 
34.3 

24.4 
33.8 
29.4 
39.7 
32.1 

0.2 
0.2 

Black  

0.4 

0.6 

YEARS  OF  SCHOOLING 

0.3 

12  to  15  

0.2 

0.4 

AGE  BY  YEARS  OF  SCHOOL 
SCHOOL  COMPLETED 

0.3 

0.8 

12  to  15  

0.3 

0.5 

0.3 

0.6 

12  to  15  

0.3 

0.7 

0.3 

0.4 

12  to  15  

0.4 

0.8 

OCCUPATION  GROUP  OF 
USUAL  JOB 

Professional,  technical, 

0.3 

Sales  or  clerical  

0.2 
0.4 

1.2 

0.4 

1/Persons  of  Spanish  origin  may  be  of  any  race. 
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Table  0.  Coefficients  of  Regression  of  Log  of  Hourly  Earnings  on  Specified  Explanatory  Variables 
(Standard  errors  1n  parentheses) 


Variable 

Employed 
males 

Employed 
females 

Employed 
White  males 

Employed 
White  females 

Males  30  and  over 
with  no  familial 
Interruptions 

Females  30  and  over 
with  no  familial 
Interruptions 

UNEMP   

-.039 
(.018) 

.002 
(.018) 

-.028 
(.021) 

.002 
(.021) 

-.078 
(.018) 

.014 

(.041) 

DISAB  

-.125 
(.023) 

-.040 
(.028) 

-.144 
(.025) 

-.088 
(.032) 

-.143 
(.025) 

-.183 

(.044) 

TIME -AWAY   

-.312 
(.122) 

-.128 
(.025) 

-.068 
(.145) 

-.155 
(.028) 

(X) 

(X) 

EXPER   

.03515 
(.00175) 

.02278 
(.00184) 

.03791 
(.00189) 

.02495 
(.00200) 

.03382 
(.00306) 

.00937 

(.00600) 

EXPERSq  

-.00058 
(.00005) 

-.00042 
(.00005) 

-.00065 
(.00005) 

-.00046 
(.00005) 

-.00056 
(.00005) 

-.00014 
(.00012) 

FT   

.216 
(.032) 

.112 
(.016) 

.254 
(.035) 

.099 
(.018) 

.363 
(.064) 

.372 

(.048) 

EDUC1   

.336 
(.023) 

.358 
(.028) 

.338 
(.023) 

.322 
(.030) 

.327 
(.028) 

.301 

(.053) 

EDUC2   

.179 
(.016) 

.218 
(.018) 

.181 
(.018) 

.209 
(.021) 

.231 
(.021) 

.260 

(.046) 

EDUC4  

-.069 
(.039) 

-.146 
(.048) 

-.002 
(.044) 

-.120 
(.067) 

-.026 
(.044) 

-.415 

(.092) 

EDUC5  

-.195 
(.016) 

-.190 
(.018) 

-.173 
(.016) 

-.179 
(.018) 

-.185 
(.018) 

-.244 

(.035) 

COURSES   

.038 
(.005) 

.044 
(.005) 

.034 
(.005) 

.052 
(.005) 

.045 
(.005) 

.070 
(.009) 

MARR2   

-.023 
(.014) 

.016 
(.014) 

-.038 
(.014) 

.038 
(.014) 

-.009 
(.016) 

-.035 

(.030) 

MARR3   

-.192 
(.016) 

-.009 
(.018) 

-.141 
(.018) 

-.008 
(.018) 

-.279 
(.030) 

.029 

(.035) 

Constant  

1.318 

1.112 

1.282 

1.098 

1.172 

.993 

R2    

.24 

.18 

.22 

.19 

.20 

.28 

-  Represents  zero. 
X  Not  applicable. 
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PANEL  SURVEYS  AS  A  SOURCE  OF  MIGRATION  DATA 


Donald  C.  Dahmann,  U.S.  Bureau  of  the  Census 


PREVIOUS  USES  OF  PANEL  SURVEYS  FOR  GEOGRAPHICAL 
MOBILITY  RESEARCH 

Panel  surveys,  the  technique  of  using  repeat- 
ed interviews  with  a  sample  of  individuals  that 
remains  constant,  have  been  employed  as  a  re- 
search strategy  to  assist  in  understanding 
processes  with  an  inherent  capacity  for  change 
for  at  least  half  a  century.  Virtually  all  of 
the  earliest  uses  of  panel  analysis  were  employ- 
ed to  investigate  change  in  opinions,  and 
particularly  in  preferences  for  political  cand- 
idates (Levenson,  1978;  Rice,  1928;  Lazarsfeld 
and  Fiske,  1938;  and  Berelson,  Lazarsfeld,  and 
McPhee,  1954).  From  these  beginnings,  various 
forms  of  panel  analysis,  and  more  generally 
longitudinal  analysis,  have  come  to  be  utilized 
in  a  wide  variety  of  social  science  research 
applications,  and  recently  were  singled  out  by 
the  Office  of  Federal  Statistical  Policy  and 
Standards  as  an  approach  to  data  collection  and 
analysis  to  which  "much  attention  should  be 
devoted... in  the  development  of  [federal] 
statistical  programs  for  the  1980's"  (U.S. 
Department  of  Commerce,  1978;  p.   321). 

The  real  blossoming  of  panel  surveys,  both 
in  terms  of  numbers  of  surveys  and  their  size, 
occurred  during  the  past  two  decades  (and  prin- 
cipally during  the  1970s).  They  were  employed 
as  a  means  of  monitoring  and  evaluating  the 
effects  of  a  variety  of  federally  sponsored 
large-scale  experiments  and  programs,  and  to 
enhance  our  understanding  of  the  factors 
involved  in  a  variety  of  social  and  economic 
processes,  e.g.,  educational  attainment,  soc- 
ialization, and  labor  force  participation. 

The  principal  federally  sponsored  experi- 
ments including  panel  analysis  components  were 
the  several  Income  Maintenance  Experiments, 
the  various  Experimental  Housing  Allowance 
programs,  and  the  Urban  Homestead ing  Demon- 
stration. Uses  of  panel  analysis  in  each  of 
these  programs  for  geographical  mobility 
research  are  discussed  in  the  next  section. 

Several  large-scale  panel  surveys  were  also 
initiated  during  this  period  in  response  to  the 
need  for  basic  information  on  educational 
attainment,  labor  force  participation,  and 
social  stratification  processes  (Borus,  1982). 
Major  surveys  in  this  second  group  include  the 
National  Longitudinal  Survey  of  Laobr  Market 
Practices,  begun  in  1966  and  sponsored  by  the 
U.S.  Department  of  Labor  with  distribution 
through  Ohio  State  University's  Center  for 
Human  Resource  Research  (popularly  referred  to 
as  the  Parnes  data;  Parnes,  1974;  Center  for 
Human  Resources  Research,  1974;  Bielby,  Hawley, 
and  Bills,  1977;  Parnes  and  Rich,  1980;  Leigh, 
1982;  Daymont,  1983);  the  Panel  Study  of  Income 
Dynamics,  conducted  by  the  University  of 
Michigan's  Institute  for  Social  Research  for 
the  U.S.  Department  of  Health  and  Human 
Services  since  1968  (Duncan  and  Morgan,  1982); 
Project  TALENT,  conducted  by  the  American 
Institutes  for  Research  (Wise  and  Steel,  1980); 
Youth  in  Transition  Project,  conducted  by  the 
University  of  Michigan's  Institute  for  Social 
Research  (Bachman  and  O'Malley,  1980);  National 


Longitudinal  Survey  of  High  School  Seniors, 
conducted  by  Research  Triangle  Institute  for 
the  National  Center  for  Education  Statistics 
(Eckland  and  Alexander,  1980);  the  Wisconsin 
Longitudinal  Study  (Sewell  and  Hauser,  1980); 
and  Explorations  in  Equality  of  Opportunity, 
initiated  by  the  Educational  Testing  Service 
(Alexander  and  Eckland,  1980). 

Questions  that  these  latter  surveys  were 
designed  to  address  rarely  included  geograph- 
ical mobility  as  a  major  topical  area.  Much 
attention  in  terms  of  questionnaire  design  or 
questions  asked  however,  was  not  required  as 
geographical  mobility  data  flowed  as  a  natural 
outcome  of  the  follow-up  of  panel  members. 
Thus,  general  information  on  geographical 
mobility  derived  of  the  panel  design  was  suf- 
ficient for  relating  movement  with  labor  force 
participation,  educational  attainment,  social 
and  occupational  stratification  processes, 
household  changes  with  passage  through  the  life 
course,  shifts  in  housing  consumption,  etc. 

The  Experimental  Housing  Allowance  Programs 
and  the  Income  Maintenance  Experiments,  on  the 
other  hand,  were  both  specifically  interested 
in  geographical  mobility  as  an  integral  element 
of  evaluating  the  effectiveness  of  the  programs. 
Both  programs,  for  instance,  were  concerned  with 
the  effect  upon,  and  role  of  residential  mobil- 
ity in  relation  to  patterns  of  housing  consump- 
tion, and  the  Income  Maintenance  Experiments 
were  further  concerned  with  the  role  of  migra- 
tion as  a  response  to  income  guarantees  among 
lower-income  households.  The  fact  that  infor- 
mation on  the  various  forms  of  geographical 
mobility  flowed  directly  from  the  panel  design 
of  these  surveys  without  specific  geographical 
mobility  questionnaire  items  demonstrates  the 
capacity  of  panel  surveys  to  assist  our  under- 
standing the  role  of  geographical  mobility  in  a 
variety  of  social  and  economic  circumstances. 
Experimental  Housing  Allowance  Program 

The  Experimental  Housing  Allowance  Program, 
initiated  by  the  Housing  and  Community  Develop- 
ment Act  of  1970,  was  undertaken  to  establish 
empirical  evidence  of  the  effects  of  housing 
allowances,  and  of  the  transfer  of  small 
amounts  of  unrestricted  funds  to  lower-income 
households  on  housing  consumption.  The  follow- 
ing questions  highlight  some  of  the  experi- 
ment's major  research  goals.  Would  households 
spend  the  money  on  housing?  Would  the  money  be 
used  to  improve  conditions  of  their  current 
dwelling?  Would  households  move  to  other 
neighborhoods?  What  would  be  the  local  housing 
market  impact  of  such  an  infusion  of  funds.  To 
answer  such  questions,  three  programs — Housing 
Allowance  Supply  Experiment,  Housing  Allowance 
Demand  Experiment,  and  Administrative  Agency 
Experiment — eventually  enrolled  more  than 
30,000  households  at  twelve  sites  throughout  the 
country  at  a  cost  in  excess  of  $160  million 
(Friedman  and  Weinberg,  1982;  1983;  Bradbury 
and  Downs,  1981;  Struyk  and  Bednick,  1981). 
Evaluation  of  these  individual  programs  pro- 
duced several  longitudinal  analyses,  but  only 
one  that  utilized  individual  households  as  its 
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unit  of  observation.  Other  longitudinal  anal- 
yses took  housing  units  and  neighborhoods  as 
their  units  of  observation  (Hillestad  and 
McDonald,  1983),  or  obtained  retrospective 
geographical  mobility  data  for  individuals 
(McCarthy,  1983).  Individuals  in  both  control 
and  test  groups  of  the  Housing  Allowance  Demand 
Experiment  were  traced  for  three  years  to  ob- 
serve, among  other  things,  actual  patterns  of 
Residential  mobility  and  what  changes,  in  terms 
of  housing  consumption  and  residential  disper- 
sion (and  therefore  desegreation) ,  may  have 
resulted  from  such  moves  (Rossi,  1981; 
Hamilton,  1983). 

Similar  panel  analyses  were  utilized  in 
evaluating  the  effects  of  one  other  important 
housing  experiment  initiated  at  about  the  same 
time,  the  U.S.  Department  of  Housing  and  Urban 
Development's  Urban  Homesteading  Demonstration. 
This  effort,  begun  as  a  demonstration  and  later 
established  as  a  regular  program,  transferred 
HUD-owned  properties  to  local  control  in  23 
cities  (U.S.  Department  of  Housing  and  Urban 
Development,  1977).  Its  research  agenda  in- 
cluded inquiry  into  the  effects  of  residential 
mobility  on  patterns  of  local  housing  consump- 
tion, this  time  with  specific  reference  to  the 
displacement  of  low  income  households  from 
housing  they  could  no  longer  afford  as  the 
result  of  the  HUD  owned  properties  being 
returned  to  the  market  (Schnare,  1979). 
Income  Maintenance  Experiments 

Another  of  the  recent  massive  social  experi- 
ments that  included  panel  analysis  as  a  re- 
search component  is  the  series  of  income  main- 
tenance experiments,  the  first  large  scale 
social  experiments  to  be  conducted  in  the 
United  States,  (begun  in  1967) — New  Jersey 
Income  Maintenance  Experiment  (Kershaw  and 
Fair,  1976;  Watts  and  Rees,  1977;  Pechman  and 
Timpane,  1975),  Rural  Income  Maintenance  Ex- 
periment (Bawden  and  Harrar,  1975;  Palmer  and 
Pechman,  1978),  Gary  Income  Maintenance  Experi- 
ment (Journal  of  Human  Resources,  1979),  and 
the  Seattle  and  Denver  Income  Maintenance 
Experiment  (Journal  of  Human  Resources,  1980). 
Each  of  these  four  programs  addressed  various 
aspects  of  one  basic  issue:  how  much  would  a 
nationwide  guaranteed  income  cost,  and  to  what 
extent  would  families  reduce  their  labor  force 
participation  (and  therefore  earnings)  in  re- 
sponse to  such  payments?  As  with  the  massive 
housing  allowance  experiments,  research  agendas 
of  the  income  maintenance  experiments  did  not 
include  geographical  mobility  as  a  specific 
focus.  Nonetheless,  the  panel  designs  employed 
in  the  evaluations,  which  traced  families 
(households)  over  a  three-year  period,  produced 
their  own  mobility  data.  Analyses  were  under- 
taken of  both  of  the  two  major  forms  of  geo- 
graphical mobility:  (1)  migration,  specifically 
rates  of  movement  from  the  experimental  site  to 
other  labor  markets  and,  (2)  residential  mobi- 
lity, change  of  one's  dwelling  to  consume  a 
different  bundle  of  housing  services  (quality 
of  dwelling,  neighborhood  services,  etc.). 
Data  to  investigate  both  of  these  extremely 
important  outcomes  of  the  decision  to  move 
were  readily  derivable  from  the  panel  design  of 
the  surveys  used  to  evaluate  the  experiments. 


Findings  with  regard  to  migration  (spatial 
adjustments  in  labor  force  participation)  may 
be  summarized  as  follows:  (1)  migration  out  of 
the  experimental  site's  labor  market  was  signi- 
ficantly increased  for  married  white  males  and 
females  but  not  for  married  black  males  and 
females  and,  (2)  outmigration  was  to  locales 
with  generally  lower  wage  rates  and  with  better 
living  environments.  Work  hours  in  the  new 
locations  were  generally  less  than  previously, 
suggesting  either  that  persons  worked  fewer 
hours  in  their  new  locations  because  of  their 
additional  income  or  that  their  search  for  a 
"satisfactory"  job  in  the  new  locale  took  some 
time  (Keeley,  1980). 

With  regard  to  residential  mobility,  it  was 
discovered  that  (1)  households  moving  to  a  new 
address  generally  improved  their  housing  situa- 
tion (Wooldridge,  1977;  Kaluzny,  1979),  and  (2) 
the  effects  of  income  assistance  as  a  means  of 
enabling  renter  households  to  move  into  a  home 
of  their  own  were  mixed  (Wooldridge,  1977, 
Poirier,  1977). 
Panel  Study  of  Income  Dynamics 

A  more  archetypical  panel  study,  at  least  in 
traditional  terms,  is  the  Panel  Study  of  Income 
Dynamics  (PSID)  conducted  by  the  Institute  for 
Social  Research  of  the  University  of  Michigan 
for  the  U.S.  Department  of  Health  and  Human 
Services.  Initiated  during  the  same  period — 
Great  Society  Era  of  the  1960s— as  the  Income 
Maintenance  Experiments,  this  panel  survey  is 
now  in  its  17th  year  of  collecting  annual 
information  from  a  representative  national 
sample  of  about  6,000  families  and  15,000 
individuals  (Morgan  and  Smith,  1969).  The 
Survey  has  produced  a  massive  body  of  data,  a 
massive  array  of  findings  (Duncan  and  Morgan, 
1982,;  Morgan,  1974;  Duncan  and  Morgan, 
1975-1980;  and  Hill,  Hill,  and  Morgan,  1981), 
and  even  outlived  its  original  sponsoring 
federal  agency,  the  Office  of  Economic  Oppor- 
tunity. 

Panel  Study  of  Income  Dynamics  data  have 
been  utilized  to  address  questions  in  both  of 
the  two  major  realms  of  geographical  mobility 
research.  It  has  also  been  utilized  to  con- 
sider more  basic  geographic  mobility  research 
questions — timing  of  moves  through  the  life 
course,  relationships  between  desires  or 
expectations  to  move  and  actual  movement,  and 
other  similar  questions  that  are  intrinsic  to 
the  geographical  mobility  processes.  The  rich 
set  of  personal  attribute  and  attitudinal 
variables  in  the  Michigan  panel  has  enabled 
residential  mobility  research  to  be  framed  in 
behavioral  terms,  whereby  households  are  seen 
as  possessing  specific  desires  and  preferences 
with  respect  to  moving.  Structural  elements  of 
the  participation  system — income  levels  and 
purchasing  power,  housing  costs,  forced  reloca- 
tion, etc. — in  this  frame  of  reference  then 
serve  to  assist  or  hinder  actual  patterns  of 
mobility,  and  therefore  preference  fulfillment 
(Roistacher,  1974;  1975;  Goodman,  1974;  Duncan 
and  Newman,  1975;  1976;  Newman  and  Duncan, 
1979;  and  Newman  and  Owen,  1982). 

Use   of  these  panel  data  have  enabled 
researchers  to  examine  the  characteristics  of 
migrants  in   various   interregional  migration 
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systems  during  the  1970s  (Kim,  1980),  patterns 
and  consequences  of  repeat  moves  (Newman  and 
Ponza,  1981),  and  to  initiate  structuring  of 
general  models  of  mobility  (Morgan,  1977). 
Migration  as  an  act  resulting  in  the  readjust- 
ments of  labor  markets  has  also  been  explored 
both  in  terms  of  causes,  such  as  the  effects  of 
unemployment  on  movement  (DaVanzo,  1978),  and 
of  consequences  for  individuals,  in  terms  of 
income  and  occupational  change  (Harris,  1981). 
TOE  SURVEY  OF  INCOME  AND  PROGRAM  PARTIC- 
IPATION AS  A  SOURCE  OF  MIGRATION  DATA 

The  Survey  of  Income  and  Program  Participa- 
tion  (SIPP),  a  general  purpose,  large  scale 
(25,000  household),  national  representative 
sample  survey,  has  been  undertaken  by  the  U.S. 
Bureau  of  the  Census  primarily  to  provide:  (1) 
improved  data  on  the  economic  situation  of 
individuals  and  households  and  (2)  information 
on  federal  and  state  income  transfer  and  social 
program  participation.  Individuals  are  inter- 
viewed every  four  months  for  the  life  of  a 
panel.  In  the  case  of  the  first  (1984)  panel, 
this  will  result  in  a  total  of  9  waves  of 
interviews  for  three — quarters  of  the  panel  and 
8  waves  for  the  remaining  quarter.  Initial 
interviews  for  the  1984  panel  were  conducted  in 
October  1983  (with  a  reference  period  of  July- 
September  1983);  the  final  wave  of  interviews 
for  the  this  panel  will  be  conducted  in  May 
1986  (with  a  reference  period  of  January — 
April  1986).  Current  plans  for  a  second  (1985) 
panel  call  for  a  somewhat  smaller  sample  size 
( 20 , 000  households )  and  eight  waves  of 
interviews,  which  will  begin  in  January  1985. 
The  earlier  review  of  geographical  mobility 
research  topics  explored  with  data  from  panel 
surveys  presages  the  types  of  research  we  might 
expect  from  SIPP.  In  terms  of  duration  of  the 
panel,  SIPP  data  will  be  much  like  those  that 
were  derived  of  the  several  large  scale  experi- 
ments— both  cover  a  period  not  exceeding  three 
years.  In  terms  of  geographical  mobility  re- 
search therefore,  data  from  both  sources  may  be 
used  to  explore  change  over  only  a  relatively 
short  period  of  time.  Several  SIPP  character- 
istics make  it  a  close  relative  of  the  Panel 
Study  of  Income  Dynamics  as  well:  (1)  its 
sample  is  national  (though  larger)  rather  than 
being  limited  to  selected  settlements,  and  (2) 
more  waves  of  interviews  are  being  conducted, 
which  will  provide  better  data  for  establishing 
joint  incidence  of  movement  with  other — life 
course,  employment,  etc. — events.  The  geo- 
graphical mobility  research  derived  of  the 
panel  surveys  discussed  earlier  suggests  that 
attention  must  be  given  to  two  specific  aspects 
of  such  surveys:  first,  the  periodicity  of 
waves  and  overall  duration  of  the  panel;  and 
second,  the  substantive  nature  of  data  col- 
lected during  each  wave.  SIPP  is  unique  in 
the  short  span  of  time  between  waves — four 
months.  This  design  characteristic  makes  it 
particularly  valuable  as  a  means  of  matching 
residential  shifts  and  migration  with  other 
life  events  such  as  marriage,  divorce,  expan- 
sion and  contraction  of  household  size  in 
general,  loss  of  job,  change  in  job,  and  the 
like.  No  previous  national  survey  has  provided 
such  a  fine  temporal  scale  for  establishing  the 


joint-incidence  of  geographical  movement  with 
important  employment  and  life  events. 

The  fact  that  each  panel  collects  data  for 
2  1/2  years  presents  both  advantages  and 
disadvantages.  As  SIPP  panels  will  not  be 
maintained  for  years  unto  decades,  as  have  the 
several  major  panel  surveys  focusing  on  changes 
in  labor  force  participation,  educational 
attainment,  and  social  mobility  through  the 
life  course,  analysis  of  the  role,  conse- 
quences, and  duration  of  effects  of  geograph- 
ical mobility  through  major  stages  of  the  life 
course  is  not  feasible.  Nonetheless,  2  1/2 
years  (plus  the  fact  that  a  large  number  of 
waves  will  be  conducted)  is  quite  sufficient  to 
establish  both  immediate  and  some  intermediate- 
term  effects  associated  with  geographical 
mobility  on  topics  of  concern  such  as  the 
spatial  restructuring  of  labor  markets.  The 
duration  of  SIPP  panels  also  provides  a  reason- 
able amount  of  time  to  relate  the -expectations 
of  individuals  toward  mobility  to  actual  pat- 
terns of  movement. 

Once  we  have  accustomed  ourselves  to  the 
fact  that  the  act  of  tracking  those  who  move  in 
a  panel  survey  provides  migration  data,  then 
what  must  be  considered  in  addressing  geograph- 
ical mobility  questions  is  the  basic  substance 
,  of  the  questionnaire  administered  prior  to  and 
following  the  move.  In  the  case  of  SIPP  we  are 
provided  with  a  wealth  of  relevant  migration- 
related  information:  labor  force  participation 
and  employment,  industry  and  occupation,  work 
history,  education,  health  conditions  and 
disability,  household  composition,  and,  of 
course,  income.  As  the  same  questionnaires  are 
administered  at  the  same  times  to  nonmovers, 
the  opportunity  exists  for  comparing  the  situa- 
tions of  movers  and  nonmovers  directly. 

In  consideration  of  these  several  points, 
what  should  we  be  looking  to  SIPP  for  in  terms 
of  geographical  mobility  research?  First,  I 
think  that  we  can  expect  better  data.  For 
decades  the  Current  Population  Survey  (CPS)  has 
served  as  our  national  metric  establishing 
levels  of  movement  among  the  various  components 
of  the  nation's  settlement  system,  among 
regions,  and  among  subpopulations  of  the 
nation's  peoples.  All  CPS  geographical  mobil- 
ity data  are  collected  retrospectively — some- 
times asking  respondents  to  refer  to  an  event 
that  occurred  one  year  ago,  sometimes  five 
years  ago.  These  data,  like  all  retrospec- 
tive data,  are  subject  to  biases  introduced  by 
the  distorting  effects  of  memory  loss,  disson- 
ance reduction  (rationaliazation) ,  and  the 
like.  How  does  SIPP  data  compare  with  CPS 
data?  Will  SIPP's  multiple  waves  of  data  col- 
lection enable  us  to  specify  the  overall 
effects  of  repeat  movers  on  mobility  statistics 
in  a  way  that  cross-sectional  data  do  not? 
What  will  differences  between  the  two  survey's 
geographical  mobility  data  tell  us?  The  fact 
that  information  on  movement  (and  non-movement ) 
and  a  wide  array  of  life  events  will  be  col- 
lected almost  as  they  occur  (specific  to 
within  four  months)  is  one  of  SIPP's  best  fea- 
tures from  the  perspective  of  geographical 
mobility  research.  Our  capacity  to  specify  the 
relationships  between  movement  and  such  events 


as  the  loss  of  a  job,  termination  of  the  receipt 
of  unemployment  benefits,  marriage,  divorce, 
etc.  has  never  been  better. 

A  set  of  supplemental  migration  questions, 
which  should  be  administered  to  all  individuals 
for  a  least  one  (preferably  early)  wave  of  in- 
terviewing, should  also  be  considered. 
First,  respondents  should  be  asked  a  set  of 
mobility  preference  questions,  to  relate 
desires  and  expectations  of  movement  with 
patterns  of  actual  mobility.  Secondly,  retro- 
spective questions  on  one's  general  residential 
history  should  be  asked  so  that  subsequent 
moves  may  be  related  to  previous  patterns  of 
mobility  and  locations. 

One  further  aspect  of  SIPP's  design  that 
should  not  be  overlooked  when  thinking  about 
geographical  mobility  research  is  its  capacity 
to  provide  information  on  the  locales  of  origin 
and  destination  of  movers  (and  nonmovers  as 
well).  The  ability  of  SIPP  to  provide  infor- 
mation on  conditions  in  both  the  labor  markets 
that  migrants  leave,  and  those  to  which  they 
move,  is  of  particular  concern  when  wishing  to 
understand  the  spatial  differentiation  of  labor 
markets  and  to  ascertain  the  causes  of  subnati- 
onal  (regional)  patterns  of  employment  growth 
and  decline.  What,  for  instance,  are  the  re- 
lationships between  sending  and  receiving 
markets  in  terms  of  unemployment  rates,  wage 
levels,  etc.?  Are  these  structural  situations 
consequential  in  terms  of  employment?  Are  dif- 
ferent mechanisms  operating  for  blue  collar  and 
white  collar  migrants  that  such  differences 
articulate?  These  are  some  of  the  questions 
that  should  guide  attempts  to  maximize  the 
utilization  of  SIPP  data  for  geographical 
mobility  research. 
CONCLUSION 

The  Survey  of  Income  and  Program  Participa- 
tion enables  us  to  explore  new  questions  con- 
cerning both  of  the  major  forms  of  geographical 
mobility — residential  mobility  and  migration. 
With  particularly  good  income  and  public  program 
participation  data,  good  specification  of  the 
timing  of  movement  with  significant  life  events, 
and  (potentially)  good  market  characteristics 
data,  SIPP  is  ideally  suited  to  address  a 
multitude  of  housing  consumption  questions. 
With  good  information  on  participation  in 
federal  and  state  sponsored  programs,  excep- 
tionally good  income  data,  and  (potentially) 
good  information  on  the  characteristics  of 
labor  markets,  SIPP  promises  to  be  an  incom- 
parable research  tool  for  questions  that  have 
heretofore  simply  gone  unasked  regarding  mi- 
gration, and  particularly  as  it  relates  to 
readjustments  of  the  spatial  dimension  of  labor 
markets. 

We  must  also  be  prepared  to  take  advantage 
of  the  serendipitous  benefits  of  timing.  In 
this  regard,  the  availability  of  SIPP  data  and 
recent  advances  in  analytical  techniques  pro- 
vide us  with  opportunities  that  were  nonexis- 
tant  even  a  decade  ago.  With  regard  to  analy- 
tical techniques  I  am  thinking  particularly  of 
those  developed  during  the  1970s  for  analyzing 
categorical  data  (Goodman,  1978;  Bishop, 
Fienberg,  and  Holland,  1975;),  and  their  speci- 
fic application  to  the  analysis  of  change  in 


mobility  and  panel  data  (Hauser,  1979;  Duncan, 
1981;  Fienberg,  1980).  The  richness  of  SIPP 
data  provide  a  wonderful  opportunity  to 
fully  utilize  the  analytical  advances  brought 
about  by  these  techniques  to  answer  a  myriad  of 
geographical  mobility  questions. 

The  Great  Society  programs  of  the  1960s 
pushed  social  scientists  as  never  before  to  ask 
questions  about  American  society  and  its 
economy.  In  response  to  these  demands,  new  and 
better  data  were  collected,  new  analytical 
techniques  were  developed,  and  new  research 
agendas  established.  Much  was  learned  from 
these  efforts  about  the  causes,  the  roles 
performed  by,  and  effects  of  geographical 
mobility  on  the  nation's  economic  and  social 
structure.  SIPP  represents  a  logical  outcome 
of  advances  in  social  science  data  collection 
that  began  in  the  1960s  and  an  important  new 
opportunity  for  geographical  mobility  research. 
I  invite  your  comments  on  ways  that  we  at  the 
Census  Bureau  can  enhance  this  new  survey's 
utility  for  answering  your  geographical  mobil- 
ity questions. 
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SIPP  AND  CPS  LABOR  FORCE  CONCEPTS:     A  COMPARISON 
by  Paul   Ryscavage,   Bureau  of  the  Census 


Background 

The  Survey  of  Income  and  Program  Participation 
(SIPP)  is  a  new  Census  Bureau  survey  designed  to 
give  policymakers,  researchers,  and  the  public 
an  in-depth  look  at  the  economic  situation  of 
persons  and  households  in  the  United  States.  Its 
primary  purpose  is  to  collect  data  on  the  kinds 
and  amounts  of  income  received  by  persons  and 
the  extent  of  their  participation  in  government 
income  transfer  programs,  such  as  Social  Security 
and  Aid  to  Families  with  Dependent  Children.  The 
full  scope  of  SIPP  as  a  source  of  information  on 
the  well-being  of  our  society,  however,  is  still 
being   realized. 

One  important  byproduct  of  SIPP  is  information 
on  the  labor  force  activity  of  individuals.  Work- 
ing or  not  working  is  frequently  associated  with 
one's  economic  situation  and  also  one's  partici- 
pation or  nonparticipation  in  social  welfare 
programs.  An  obvious  illustration  is  the  rela- 
tionship bet-ween  job  loss  and  the  receipt  of 
unemployment   insurance   payments. 

In  the  development  of  the  SIPP  labor  force 
questions,  an  effort  was  made  to  make  them  con- 
ceptually similar  to  those  in  the  Current  Popu- 
lation Survey  (CPS)  which  is  the  survey  used  to 
collect  the  Federal  government's  official  labor 
force  statistics.  The  CPS,  in  operation  since 
1940,  was  developed  for  the  sole  purpose  of  esti- 
mating the  numbers  of  employed  and  unemployed 
persons   in  the  country. 

At  the  core  of  the  CPS  labor  force  data  is  the 
"activity  concept."!/  Basically,  the  concept 
amounts  to  identifying  persons'  activities  in 
relation  to  the  labor  market  during  a  specific 
period  of  time.  In  the  CPS  the  period  of  time  is 
one  week.  Persons  in  the  adult  population  can 
then  be  sorted  into  three  mutually  exclusive 
groups  depending  on  their  activity  during  the 
week:  working,  not  working  but  seeking  work  ,and 
neither  working   nor  seeking  work. 

While  many  refinements  have  been  made  to  the 
activity  concept  and  the  operation  of  the  CPS 
through  the  years,  the  keystone  of  the  Nation's 
employment  and  unemployment  estimates--acti vity 
during  a  specific  reference  week--has  not  not 
been  changed.  The  concept  and  the  CPS  have  been 
reviewed  periodically  by  Presidential ly  ap- 
pointed commissions  to  insure  their  soundness. 
The  most  recent  review  was  by  the  National  Com- 
mission on  Employment  and  Unemployment  Statis- 
tics (NCEUS)  in  the  late  1970's.l'  Although  the 
Commission  recommended  some  modifications  of  de- 
finitions used  in  the  CPS,  it  pronounced  the 
basic  activity  concept  as  sound.!/ 

Compared  to  the  CPS,  SIPP  is  in  its  infancy. 
Its  genesis  was  the  Income  Survey  Development 
Program  begun  in  the  mid-1970' s  by  the  Department 
of  Health,  Education,  and  Welfare.  1'  Despite  its 
newness,  SIPP  has  great  potential  for  not  only 
casting  light  on  the  nature  of  income  dynamics, 
but  also  on  how  labor  force  activity  is  related 
to  it.  Indeed,  the  NCEUS  suggested  there  was  a 
need  "to  link  labor  force  experience  with  income 
data"  so  as  to  add  a  qualitative  dimension  to 
labor  force  statistics.!/  SIPP  data  will  show  on 
a  regular  basis  how  well  the  labor  market  is  pro- 
viding for    the    economic    well-being    of    workers 


and  their  households. 

An  obvious  question  among  labor  force  analysts 
is  how  will  the  SIPP  and  CPS  labor  force  data 
compare?  Although  we  can't  answer  that  question 
at  this  time  because  SIPP  labor  force  data  are 
still  being  processed,  we  can  compare  SIPP  and 
CPS  labor  force  concepts. ^1  More  specifically,  we 
can  examine  how  the  activity  concept  is  applied 
in  both  surveys.  We  begin  first  by  briefly  re- 
viewing some  of  the  survey  design  characteristics 
of  the  SIPP  and  CPS  and  then  compare  specific 
SIPP  and  CPS  labor  force  definitions.  A  conclud- 
ing section  of  the  paper  discusses  potential 
uses  of  SIPP  labor  force  data. 

Survey  Design  Characteristics  of  SIPP  and  CPS 

Labor  force  analyses  (as  well  as  other  kinds 
of  analyses)  are  frequently  limited  because  the 
data  being  analyzed  come  from  surveys  with  unique 
survey  design  characteristics.  For  example, 
small  sample  size  often  creates  difficulties  for 
analysts.  Three  survey  design  features  of  SIPP 
and  CPS  which  are  important  from  an  analytical 
standpoint  are  discussed  below. 

Samples.  Significant  differences  exist  in  the 
sample  size  and  design  of  SIPP  and  CPS.  SIPP  is 
a  longitudinal  panel  survey  comprised  originally 
of  26,000  households  located  in  174  areas  around 
the  country.  The  sample  is  divided  into  four  ro- 
tation groups  and  households  in  each  group  are 
interviewed  every  four  months  for  approximately 
two  and  one-half  years.  The  first  rotation  group 
was  interviewed  in  October  1983,  and  interviews 
were  conducted  in  the  second,  third,  and  fourth 
rotation  groups  in  November,  December,  and 
January,  respectively.  This  staggered  sample  de- 
sign produces  a  cycle  or  wave  of  interviewing  and 
takes  four  months  to  complete  after  which  the  ro- 
tation groups  are  reinterviewed  in  the  same  se- 
quence. The  Census  Bureau  plans  to  introduce 
another  panel  of  approximately  20,000  households 
in  January  1985  and  another  20,000  household  pan- 
el in  January  1986.  Consequently,  SIPP's  sample 
size  will  grow  as  panels  are  overlapped,  increas- 
ing the  reliability  of  the  estimates. 

The  CPS  is  basically  a  cross-sectional  surey, 
but  it  also  has  a  longitudinal  dimension.!/  It 
is  a  much  larger  survey  being  composed  of  60,000 
households  located  in  629  areas  across  the  coun- 
try. The  CPS  sample  is  divided  into  eight  rota- 
tion groups  but  unlike  the  staggered  sample  de- 
sign of  SIPP,  all  rotation  groups  are  in  opera- 
tion in  a  single  month.  The  longitudinal  aspect 
of  CPS  results  from  the  rotation  group  pattern  in 
which  a  household  is  in  the  sample  for  four  con- 
secutive months,  out  for  eight  and  then  back  in 
for  four  more  months.  This  pattern  allows  three- 
quarters  of  the  households  to  be  the  same  from 
month-to-month  and  one  half  to  be  the  same  over 
the  year.  This  is  important  because  labor  force 
analyses  of  CPS  data  conducted  by  the  Bureau  of 
Labor  Statistics  (BLS)  concentrate  on  month-to- 
month  and  year-to-year  changes. 

Two  problems  for  labor  force  analysts  who  use 
household  survey  data  are  biases  resulting  from 
the  sample's  design  and  from  interview  nonre- 
sponse.  Rotation  group  bias  has  always  been  a 
problem  in  the   CPS   and    it  has    received   much  at- 


tention  over  the  years.!/  Theoretically,  each  CPS 
rotation  group  should  produce  the  same  estimates, 
except  for  random  differences  due  to  sampling 
variability.  The  estimate  of  unemployment  from 
the  first  rotation  group,  however,  is  usually 
greater  than  the  estimate  based  on  all  rotation 
groups.!/  (Recently,  the  difference  has  averaged 
about  six  percent.)  The  reason  for  the  difference 
has  never  been  isolated.  Because  all  SIPP  rota- 
tion groups  in  a  SIPP  panel  have  been  in  the 
sample  for  the  same  amount  of  time,  this  type  of 
bias  will  not  be  immediately  observable.  It 
will  be  possible  to  observe  after  the  introduc- 
tion of  the  1985  SIPP  panel  in  January  1985,  how- 
ever, since  rotation  groups  of  different  sample 
ages  will  then  be  in  operation  at  the  same  time. 
A  second  bias  problem  involves  survey  nonre- 
sponse--unit  or  total  nonresponse  and  item  non- 
response.  In  the  CPS,  the  unit  noninterview  rate 
has  hovered  around  the  four  to  five  percent  mark 
in  recent  years;  item  nonresponse  rates  vary  by 
item,  but  in  the  March  CPS  income  questions  gen- 
erally have  the  highest  nonresponse  rate. 12/  The 
Census  Bureau  has  developed  noninterview  adjust- 
ments and  imputation  schemes  for  dealing  with 
these  problems.  While  the  first  panel  in  SIPP  is 
less  than  a  year  old,  it  appears  that  the  unit 
noninterview  rate  for  the  first  SIPP  interviews 
is  about  the  same  as  in  the  CPS.  (A  cumulative 
noninterview  rate  will  be  available  from  SIPP  as 
subsesquent  waves  of  interviewing  is  completed.) 
Item  nonresponse  in  SIPP  is  presently  being  in- 
vestigated at  the  Census  Bureau.ll/  Because  both 
labor  force  and  income  questions  are  asked  at 
the  same  time,  the  quality  of  the  SIPP  labor 
force  data  may  be  affected. 

Survey  eligibility  and  coverage.  Respondent  eli- 
gibility and  coverage  are  somewhat  different  in 
SIPP  and  CPS.  In  SIPP  all  household  members 
15  years  of  age  and  over  are  eligible  to  be 
interviewed  and  all  eligible  persons  are  inter- 
viewed if  present  at  the  time  of  the  interview. 
If  an  eligible  person  is  not  home,  a  "proxy"  in- 
terview is  obtained  from  a  knowledgeable  person, 
otherwise  a  return  visit  is  scheduled.  In  the  CPS 
the  age  of  eligibility  is  16  years  and  over  (data 
are  also  collected  for  14  and  15  year  olds);  one 
adult  household  respondent  may  answer  the  ques- 
tions for  all  household  members. 

Telephone  interviewing  is  also  handled  differ- 
ently in  the  two  surveys.  Telephone  interviews  in 
SIPP  must  have  prior  regional  office  approval, 
except  in  the  case  of  information  not  obtained 
in  the  course  of  the  interview.  In  the  the  CPS, 
telephone  interviews  are  permitted  in  the  second, 
third,  fourth,  sixth,  seventh,  and  eighth  month 
in  which  the  households  are  in  sample. 

Another  difference  concerns  the  treatment  of 
the  Armed  Forces.  In  the  monthly  CPS,  members  of 
the  Armed  Forces  living  in  households  are  not 
eligible  for  interview.  In  SIPP,  however,  such 
individuals  are  interviewed  as  long  as  they  are 
stationed  in  the  area  and  usually  reside  at  the 
address  visited.  (Both  surveys  exclude  inmates  of 
institutions,  such  as  persons  in  prisons  or  con- 
valescent homes.) 

Lastly,  and  most  significant  for  many  analy- 
ses, members  of  households  in  the  SIPP  sample  who 
move  between  interviews  are  followed  and  further 


interviews  attempted.  Sample  persons,  however, 
are  not  followed  when  they  have  been  institution- 
alized, become  a  member  of  the  Armed  Forces,  move 
outside  the  United  States,  or  move  more  than  100 
miles  from  a  SIPP  sampling  area.  In  the  CPS, 
movers  are  not  followed  and  this  has  been  a  con- 
straint on  many  longitudinal  labor  force 
analyses. 11/ 

Reference  periods.  A  fundamental  difference  be- 
tween SIPP  and  CPS--one  that  will  probably  ac- 
count for  differences  in  labor  force  estimates 
between  surveys  --  is  the  length  of  the  reference 
period.  CPS  interviews  are  conducted  in  all  rota- 
tion groups  each  month  in  the  week  containing  the 
19th  and  all  questions  about  labor  force  activity 
are  asked  in  reference  to  the  previous  week  which 
contains  the  12th,  the  survey  week.  (As  will  be 
discussed,  this  one  week  reference  period  is  ex- 
tended to  four  weeks  in  the  case  of  jobseeking.) 
Depending  on  the  respondent's  answers  to  the 
questions,  household  members  are  classified  into 
one  of  three  mutually  exclusive  groups,  employed, 
unemployed,  or  not  in  the  labor  force. 

SIPP  interviews  are  conducted  in  one  of  the 
four  rotation  groups  each  month  during  the  first 
two  weeks  of  the  month.  The  labor  force,  income, 
and  program  participation  questions  relate  to  the 
four  previous  months.  Indeed,  the  labor  force 
questions  actually  refer  to  individual  weeks  dur- 
ing the  four  month  period.  During  this  time  a 
person  could  have  worked,  looked  for  work,  and 
been  outside  the  labor  force  at  different  times. 
In  other  words,  labor  force  classification  in 
SIPP  is  not  necessarily  mutually  exclusive  as  it 
is  in  the  CPS.il/ 

Recall  problems  are  potentially  a  greater 
problem  in  SIPP  than  in  the  monthly  CPS  since 
respondents  are  recalling  activities  over  a  much 
longer  period.  For  persons  with  a  marginal  at- 
tachment to  the  labor  market,  for  example,  teen- 
agers, it  may  be  very  difficult  to  remember  job 
market  activities.  Despite  the  long  recall  period 
in  SIPP,  it  is  not  inordinately  long.  In  the  sup- 
plement to  the  March  CPS,  persons  are  asked  about 
their  labor  market  activities  in  the  previous 
calendar  year--a  reference  period  extending  back 
3  to  15  months. 11/  The  annual  work  experience 
statistics  have  been  published  by  the  BLS  and 
Census  Bureau  for  years. 

SIPP  and  CPS  Labor  Force  Definitions 

Because  the  reference  periods  in  SIPP  and  CPS 
are  of  different  lengths,  the  activity  concept  is 
applied  differently  in  both  surveys.  In  the  CPS, 
persons  are  asked  a  specific  activity-type  ques- 
tion relating  to  the  week  containing  the  12th  of 
the  month.  In  sorting  out  the  possible  labor  mar- 
ket-related activities  into  mutually  exclusive 
groups,  a  priority  scheme  is  necessary  since  some 
individuals  may  have  been  involved  in  more  than 
one  activity.  The  first  or  highest  priority  is 
assigned  to  working.  As  long  as  a  person  worked 
for  pay  or  profit  for  one  hour  or  more  (or  15 
hours  or  more  without  pay  in  a  family  operated 
business),  the  person  is  considered  employed 
even  though  he  or  she  may  also  have  looked  for 
work  or  gone  to  school  or  done  something  else 
during  that  week. 

The  next  highest  priority  is  given  to  those 
persons  who  had  a  job  during  the  survey  week,  but 
were  temporarily  absent  from  it.  Although  this 
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relaxes  the  activity  concept  slightly,  it  permits 
a  more  accurate  counting  of  the  numbers  of  per- 
sons with  actual  job  commitments.  These  persons 
may  have  been  on  strike  or  ill  or  on  vacation  or 
absent  for  some  other  personal  reason,  but  since 
they  had  a  job  to  return  to  they  are  classified 
as  employed. 

The  third  priority  is  assigned  to  persons 
whose  activity  was  looking  for  work.  If  a  person 
in  the  survey  week  neither  worked  nor  had  a  job 
but  looked  for  one  at  some  time  within  the  past 
four  weeks  (and  was  currently  available  to  take 
one)  he  or  she  is  considered  unemployed.  Once 
again  the  activity  concept  is  relaxed  to  cover 
persons  who  may  not  have  looked  for  work  contin- 
uously because  they  were  waiting  to  be  recalled 
from  layoff  or  were  waiting  to  start  a  new  wage 
or  salary  job  within  30  days.  These  persons  too 
would  be  classified  as  unemployed.  All  other  in- 
dividuals not  fitting  into  this  classification 
scheme  are  considered  to  be  not  in  the  labor 
force.  Accordingly,  the  Nation's  civilian  non- 
institutional  population  age  16  and  over  can  be 
sorted  into  the  familiar  labor  force  categories 
shown  below. 

CPS  Labor  Force  Categories 
Civilian  noni nstitutional  population  age  16  and 
over 

Civilian  labor  force  

Empl oyed  

Unempl  oyed 

Not  in  the  labor  force  

In  SIPP  persons  are  not  asked  a  specific  act- 
ivity question  relating  to  the  previous  four 
months  because  it  is  very  possible  (even  more  so 
than  in  the  CPS)  that  an  individual  may  have 
worked,  looked  for  work,  or  done  something  else 
during  the  period.  Instead,  the  initial  question 
on  the  SIPP  questionnaire  concerns  whether  or  not 
an  individual  had  a  job  or  business  at  any  time 
during  the  previous  four  months.  In  other  words, 
the  activity  concept  is  tied  to  an  individual's 
either  having  or  not  having  a  job  in  the  refer- 
ence period.  For  those  who  had  jobs,  subsequent 
questions  are  asked  about  how  long  persons  had 
their  jobs  in  the  reference  period,  whether  they 
had  been  absent  from  them  and  why,  and  whether 
they  looked  for  work  or  were  on  layoff  when  they 
did  not  have  jobs.  For  persons  who  did  not  have 
jobs  during  the  entire  period,  questions  are 
asked  if  they  looked  for  work  or  had  been  on 
layoff  and  if  so,  for  how  long. 

Unlike  the  CPS  where  a  priority  scheme  is  re- 
quired to  classify  individuals  into  mutually  ex- 
clusive labor  force  categories,  in  SIPP  individ- 
uals may  have  experienced  more  than  one  labor 
force  status  in  the  four  month  reference  period. 

For  example,  a  person  may  have  had  a  job  for 
the  entire  period  but  was  temporarily  laid  off 
for  one  month  of  the  period;  or  a  person  may  have 
had  a  job  for  two  months  and  then  quit  to  look 
for  another  one;  or  a  person  may  have  been  out- 
side the  labor  force  for  a  month,  looked  for  a 
job  for  a  month  and  then,  having  found  one, 
worked  for  two  months.  Consequently,  the  SIPP 
labor  force  categories  shown  below  reflect  mul- 
tiple labor  force  statuses. 


SIPP  Labor  Force  Categories 
Noninstitutional  population  age  16  and  over.... 
Persons  with  some  labor  force  activity  in 

period  

With  a  job  the  entire  period  

Worked  all  weeks  

Missed  some  weeks  

Spent  time  on  layoff  

With  a  job  during  part  of  the  period  

Spent  time  looking  for  work  or  on  layoff 

No  job  during  the  entire  period 

Spent  time  looking  for  work  or  on  layoff 

entire  period  

Spent  time  looking  for  work  or  on  layoff 

some  weeks  

Persons  with  no  labor  force  activity  in 

period  

A  closer  look  at  specific  SIPP  and  CPS  labor 
definitions  is  presented  below.  (The  definitions 
are  discussed  under  headings  common  in  everyday 
usage  and  should  not  be  construed  as  CPS-specific 
labor  force  terminology.) 

Employment.  In  both  surveys,  employment  is  gener- 
ally defined  as  working  at  a  job  or  business  for 
pay  or  profit  at  some  time  in  the  reference  per- 
iod. A  job  is  considered  to  be  an  arrangement  for 
regular  work  for  pay  where  payment  is  in  cash 
wages  or  salaries,  at  piece  rates,  in  tips,  by 
commission,  or  in-kind  (meals,  living  quarters, 
supplies  received).  A  business  is  defined  as  an 
activity  which  involves  the  use  of  machinery  or 
equipment  in  which  money  has  been  invested  or  an 
activity  requiring  an  office  or  "place  of  busi- 
ness" or  an  activity  which  requires  advertising. 
Payment  may  be  in  the  form  of  profits  or  fees. 
Both  surveys  also  consider  persons  to  be  employed 
when  they  have  been  absent  from  their  jobs  be- 
cause of  illness,  vacation,  bad  weather,  labor 
dispute,  and  various  personal  reasons.  Unpaid 
family  work  is  considered  employment  when  it  con- 
tributes to  the  operation  of  a  farm  or  business 
run  by  a  member  of  the  same  household.  In  the 
CPS,  unpaid  family  work  must  have  lasted  for  15 
hours  or  more  during  the  reference  week,  but  in 
SIPP  there  is  no  hours  restriction. 

Unemployment.  The  definitions  of  unemployment  in 
both  surveys  are  also  very  similar.  In  CPS,  per- 
sons must  have  been  without  a  job  during  the  ref- 
erence week  and  in  SIPP  they  must  have  been  with- 
out a  job  for  all  or  part  of  the  reference  per- 
iod; in  addition,  they  must  have  been  available 
for  work,  and  taken  some  specific  jobseeking 
activity.  Job  seeking  activity  in  CPS  may  have 
occurred  anytime  in  the  previous  four  weeks, 
while  in  SIPP  it  may  have  occurred  any  time  dur- 
ing the  four  months.  If,  in  either  survey,  job 
seeking  occurred  when  the  person  was  working, 
working  would  take  precedence  and  the  person 
would  be  considered  employed. 

Two  exceptions  to  the  above  rule  must  be 
noted.  The  first  is  the  case  of  the  person  who 
has  a  job  but  was  laid  off  and  the  second  is  the 
person  who  is  to  begin  a  new  wage  or  salary  job 
within  30  days.  Both  persons  are  considered  un- 
employed. In  the  CPS  these  persons  must  have  been 
available  for  work,  but  in  SIPP  no  availability 
test  is  applied. 

Because  the  CPS  is  basically  a  labor  force 
survey,  it  collects  more  information  about  the 
spell  of  unemployment  than  SIPP.  For  example, 


CPS  gathers  information  on  reasons  for  unemploy- 
ment whereas  SIPP  does  not.  One  can  tell  from 
CPS  data  whether  a  jobless  person  has  become  un- 
employed because  of  job  loss,  such  as  a  layoff; 
quitting  a  job  to  search  for  another;  entering 
the  labor  force  for  the  first  time;  or  re-enter- 
ing the  labor  force.  In  SIPP,  the  only  group  for 
whom  the  reason  is  known  for  being  unemployed 
are  those  persons  who  report  themselves  as  having 
jobs  from  which  they  are  absent  because  of  lay- 
off. The  CPS  also  asks  about  the  method  of  job 
search  and  how  long  one  has  been  searching  or  on 
layoff. 

Labor  force.  The  civilian  labor  force  in  the  CPS 
is  derived  by  adding  the  number  of  persons  clas- 
sified as  employed  during  the  reference  week  to 
the  number  who  were  classified  as  looking  for 
work  or  on  layoff.  The  "total"  labor  force  is  de- 
rived by  adding  to  the  civilian  labor  force  an 
independent  estimate  of  the  Armed  Forces  sta- 
tioned  in  the  United  States. 

The  labor  force  in  SIPP  (which  includes  mem- 
bers of  the  Armed  Forces  living  in  households 
but  not  in  installations  of  the  Armed  Forces)  is 
is  referred  to  as  "Persons  with  some  labor  force 
activity."  This  represents  the  sum  of  persons 
who,  during  the  four  month  reference  period, 
may  have  been   -- 

employed  during  all   weeks, 

unemployed   during   all    weeks, 

employed  and     unemployed     during     all      weeks, 

employed   and    outside    the    labor    force    during 

all    weeks, 
unemployed  and  outside  the  labor  force 

during  all    weeks, 
and  employed,   unemployed,  and   outside 
the  labor   force  during  all    weeks. 
In  other    words,    any    one    with   some   contact   with 
the  labor    market    in    the     four    month     reference 
period. 

Unemployment  rate.  The  unemployment  irate  from 
Fhe  CPS  i  s  one  of  the  most  well  known  statistics 
in  the  Nation.  It  is  derived  by  dividing  the 
number  of  unemployed  persons  by  the  civilian 
labor  force  (or  total  labor  force).  In  SIPP  a 
similar  rate,  or  proportion,  could  be  calculated. 
Unlike  the  CPS  unemployment  rate  definition, 
however,  the  numerator  in  the  SIPP  definition 
is  composed  of  persons  who  may  have  been 

unemployed  during  all   weeks, 

employed  and     unemployed     during     all     weeks, 

unemployed  and  outside  the  labor  force 
during   al 1    weeks  , 

and  employed,     unemployed,     and     outside     the 
labor  force  during  all   weeks. 

In  other  words,  the  numerator  is  composed  of 
"Persons  with  some  unemployment."  Dividing  the 
sum  of  these  groups  by  persons  with  some  labor 
force  activity—the  denominator--wi  1 1  yield 
;he  proport 
unemployment 

Not  in  the  labor  force.  In  both  the  CPS  and  SIPP, 
persons  who  have  had  no  association  with  the  job 
market  during  the  reference  period  (in  SIPP,  for 
all  or  part  of  the  reference  period)  are  consi- 
dered outside  the  labor  force.  The  CPS  further 
identifies  their    major    activity    as    in    school, 


keeping  house,  unable  to  work,  and  so  on.  This 
is  not  done  in   SIPP. 

The  CPS  inquires  in  the  fourth  and  eighth 
rotation  groups  about  previous  work  experience, 
intentions  to  seek  work  again,  desire  for  a  job, 
and  reasons  for  not  looking.  This  makes  it  pos- 
sible to  estimate  the  number  of  "discouraged 
workers."  Discouraged  workers  in  the  CPS  are  de- 
fined as  persons  who  want  a  job  but  are  not  seek- 
ing work  currently  because:  1)  they  believe  no 
work  is  available  in  their  line  of  work  or  area; 
2)  they  could  not  find  any  work;  3)  they  lack  the 
necessary  schooling  or  training,  skills,  or  ex- 
peience;  4)  employers  think  they  are  too  young  or 
old;  and  5)  they  have  other  personal  handicaps  in 
finding  a   job,    such   as   transportation    problems. 

An  effort  is  made  to  identify  discouraged 
workers  in  SIPP  also,  even  though  it  is  difficult 
to  recall  a  state  of  mind.  For  those  persons  who 
did  not  work  or  look  for  work  in  at  least  part  of 
the  four  month  reference  period  but  said  they 
wanted  a  job  and  could  have  taken  one,  a  question 
is  asked  as  to  why  they  were  not  looking.  The 
reasons  for  not  looking  are  very  similar  to  those 
in  the  CPS  questionnaire. 

Hours  of  work.  In  the  CPS  a  question  is  asked 
about  the  number  of  hours  some  one  worked  during 
the  reference  week  at  all  jobs.  This  question  is 
asked  of  all  rotation  groups  and  includes  workers 
who  have  more  than  one  job.  In  addition,  in  two 
of  the  eight  rotation  groups  a  question  is  asked 
about  the  hours  "usually"  worked  at  the  worker's 
main  job.  This  information  is  part  of  the  CPS 
data  collected  on  workers'    earnings. 

A  similar  set  of  questions  is  found  in  SIPP. 
Everyone  who  worked  is  asked  about  their  usual 
weekly  hours  on  all  jobs  during  the  four  month 
period.  Subsequent  questions  inquire  about  usual 
weekly  hours  for  the  primary  job  and  any  others. 
Full-time  and  part-time  employment.  Full-time  em- 
ployment in  both  surveys  is  defined  as  employment 
of  35  hours  a  week  or  more  while  part-time  em- 
ployment is  anything  less  than  35  hours.  Both 
surveys  seek  the  reasons  for  part-time  employ- 
ment, that  is,  whether  it  was  due  to  economic 
reasons  or  other  factors.  Economic  reasons  in- 
clude slack  work,  material  shortages,  repairs  to 
plant  or  equipment,  start  or  termination  of  a 
job  during  the  week,  and  the  inability  to  find 
full -time  work.  "Other"  reasons  include  labor 
disputes,  bad  weather,  one's  own  illness,  vaca- 
tion, keeping  house,  no  desire  for  full-time 
work,  and  full  time  worker  during  only  part  of 
the  season.  In  the  SIPP  questionnaire  the  reasons 
for  part-time  employment  are  not  as  numerous  but 
it  is  still  possible  to  identify  some  economic 
reasons  for  part-time  employment. 
Uses  of  SIPP  Labor  Force  Data 

SIPP  was  designed  primarily  as  an  income  sur- 
vey and  the  data  from  it  will  be  used  to  address 
issues  related  to  income  security  and  social  wel- 
fare programs.  With  the  inclusion  of  questions  on 
labor  force  activity,  however,  this  survey  has 
potential  for  labor  force  analysis  and  topics 
related  to  it.  In  addition,  because  of  SIPP's 
sample  design  both  cross-sectional  and  longi- 
tudinal data  can  be  obtained  from  the  survey  pro- 
viding analysts  with  more  flexibility  in  their 
analyses.   For   example,    it    is   possible  to  calcu- 


late  monthly  averages  of  the  labor  force  data 
from  SIPP  waves  since  labor  force  activity  is 
tracked  (week -by -week)  over  a  four  month  period; 
on  the  other  hand,  by  linking  all  the  SIPP  waves 
it  is  possible  to  follow  the  labor  force  activity 
of  individuals  over  two  and  one-half  years. 
While  the  CPS  will  continue  to  be  the  primary 
source  of  information  on  the  country's  labor  sup- 
ply and  the  current  unemployment  situation,  SIPP 
labor  force  data  will  complement  the  basic  CPS 
information  in  many  ways.  The  following  is  a 
discussion  of  some  of  the  applications  of  SIPP 
cross-sectional  and  longitudinal  labor  force 
data. 

Labor  market  related  economic  hardship.  For  many 
years  economists  have  tried  to  measure  the  econo- 
mic hardship  caused  by  labor  market  problems, 
whether  they  be  demand  oriented  (unemployment  due 
to  insufficient  jobs)  or  supply  oriented  (low 
wages  because  of  insufficient  skills  and  educa- 
tion). The  economic  literature  contains  many 
references  to  subemployment  indices,  employment 
and  earnings  inadequacy  indices,  and  labor  market 
hardship  measures  of  one  variety  or  another.!^./ 
The  NCEUS  in  1979  examined  this  subject  and  re- 
commended that  the  BLS  publish  an  annual  report 
"...  containing  measures  of  different  types  of 
labor  market  related  economic  hardship  result- 
ing from  low  wages,  unemployment,  and  insuffi- 
cient participation  in  the  labor  force. "iZ/  Using 
data  from  the  March  CPS,  the  BLS  has  produced 
such  reports  but  they  are  not  as  comprehensive  as 
they  might  be  because  of  data  limitations  (for 
example,  neither  the  hourly  earnings  for  part- 
year  workers  nor  the  problems  of  discouraged 
workers  are  discussed. ).!§/ 

SIPP  labor  force  and  income  data  should  be 
able  to  fill  the  gap.  For  example,  one  cross- 
sectional  table  specification  might  show  employ- 
ment problems  incurred  by  individuals  cross- 
classified  by  their  position  in  the  household 
income  distribution.  Problems  of  unemployment, 
low  hourly  wages  (below  the  Federal  minimum), 
discouragement,  and  involuntary  part-time  employ- 
ment could  be  isolated  to  help  in  formulating 
applicable  policies.  This  information,  in  combi- 
nation with  income  information,  is  available  on 
a  current  basis  only  from  SIPP. 

Labor  mobil ity  and  turnover.  Given  the  longitud- 
inal  feature  of  SIPP's  sample  design,  not  only 
can  the  income  flows  and  program  participation 
activities  of  individuals  be  monitored  for  two 
and  one-half  years  (and  periods  of  shorter  dura- 
tion), but  so  can  their  labor  force  activities. 
At  the  time  of  each  SIPP  interview,  information 
is  obtained  on  the  labor  force  activity  of  each 
household  member  age  15  and  older  during  the 
prior  four  months.  Any  changes  in  labor  force 
status  during  this  period  are  reflected  in  the 
data^  Stitching  together  the  data  collected  in 
each  of  the  eight  or  nine  interviews  will 
provide  data  users  with  a  profile  of  labor 
market  activity  for  a  two  and  one-half  year 
period. 

One  change  in  labor  force  status  that  labor 
economists  have  been  interested  in  recently  is 
the  one  which  occurs  after  a  spell  of  unemploy- 
ment. Some  have  argued  that  many  outcomes  of 
spells  of  unemployment  are  withdrawals  from  the 


labor  force  and  not  reemployment.  For  example, 
two  economists,  using  CPS  gross  flow  data,  esti- 
mated that  45  to  50  percent  of  all  unemployment 
spells  end  by  labor  force  withdrawal  .Iz/  Other 
economists  have  argued  that  the  relative  short- 
ness of  the  average  unemployment  duration  shows 
that  persons  can  quite  easily  find  their  usual 
type  of  employment  in  a  short  period  of  time. ±9/ 
With  SIPP  labor  force  data  it  will  be  possible 
to  identify  job  terminations,  observe  spells  of 
unemployment,  and  determine  not  only  their  dura- 
tions, but  their  outcomes. 

SIPP  labor  force  data  should  also  be  useful 
in  calculating  rates  of  job  separation  and  acces- 
sion. The  measurement  of  the  amount  of  job  sepa- 
ration and  accession  is  an  important  element  in 
understanding  our  basic  employment  and  unemploy- 
ment statistics.  Since  the  discontinuance  of  the 
BLS's  labor  turnover  series,  researchers  have 
been  hard  pressed  to  find  other  data  sources 
which  would  shed  light  on  the  dynamics  of  the 
labor  market .11/  While  it  will  not  be  possible 
from  the  SIPP  labor  force  data  to  identify  the 
precise  nature  of  the  separation  (layoff,  quit, 
discharge)  or  accession  (new  hire,  recall),  ag- 
gregate separation  and  accession  rates  could  be 
calculated.  These  rates  could  be  monitored  over 
the  business  cycle. 

Summary 

SIPP  is  principally  an  income  survey,  but  it 
contains  questions  on  labor  force  activity  as 
well.  SIPP  labor  force  data  will  supplement  the 
labor  force  information  from  the  CPS,  the  Federal 
government's  official  source  of  labor  force  sta- 
tistics. Like  the  CPS,  SIPP  uses  an  activity  con- 
cept for  sorting  the  Nation's  population  into 
those  persons  involved  in  the  job  market  from 
those  who  are  not.  A  major  difference  between 
the  two  surveys  is  the  length  of  the  reference 
periods;  in  the  CPS  it  is  one  week  and  in  SIPP 
it  is  four  months.  The  different  length  of  time 
for  which  labor  market  activities  are  surveyed 
will  be  an  important  factor  in  SIPP  and  CPS  labor 
force  comparisons.  Nevertheless,  while  the  CPS 
will  continue  to  tell  us  how  many  persons  are 
employed  and  unemployed,  SIPP  will  be  able  to 
tell  us  how  well  the  labor  market  is  providing 
for  these  workers  and  their  households. 

NOTE:  In  addition  to  the  references  footnoted, 
the  SIPP  Interviewer's  Manual  and  CPS  Interview- 
ers Reference  Manual  were  Used  i n  the  prepara- 
tion  of  this  paper. 
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The  new  Survey  of  Income  and  Program 
Participation  (SIPP)  will  undoubtedly  become 
a  major  source  of  data  on  a  wide  variety  of 
aspects  of  the  well-being  of  our  nation's 
households,  families,  and  individuals.  The 
very  richness  of  SIPP  suggests  the  desirabi- 
lity of  augmenting  it  with  micro-level  esta- 
blishment and  enterprise  data  from  the  econo- 
mic censuses  and  other  data  files  maintained 
by  the  Bureau  of  the  Census,  since  the  mar- 
ginal cost  of  merging  these  data  with  SIPP  is 
relatively  small  and  the  potential  gain  in 
knowledge  is  very  large.  One  area  where  the 
payoff  relative  to  cost  of  enhancing  SIPP  is 
sura  to  be  substantial  and  significant  is 
that  pertaining  to  the  behavior  of  labor 
markets. 

A  list  of  some  of  the  areas  in  which  a 
SIPP-economic  data  file  can  yield  new  insights 
includes  the  following  topics: 

The  relationship  between  capital  and  wage 

rates 
Labor  mobility 

Low  wage  workers  and  low  wage  firms 
Measuring  the  effects  of  minimum  wage 

legislation 
Structural  unemployment 
Identifying  high  tech  workers  and  high 

tech  firms 
Implications  of  the  transition  from  a  goods 

to  a  service  economy 
Unions  and  the  labor  market 
The  substitutability  of  capital  and  labor 
Productivity  analysis 

Besides  the  substantive  knowledge  to  be 
gained  by  merging  SIPP  demographic  and  econo- 
mic data,  there  are  externalities  associated 
with  merging  these  data  sets.  First,  it  will 
be  possible  to  verify  the  accuracy  of  the  size 
of  firm  estimates  given  by  respondents  in  sur- 
vey data.  An  additional,  indirect  benefit  of 
linking  SIPP  and  economic  data  stems  from  the 
fact  that  the  former  is  a  representative  sample 
of  the  working  population.  Matching  on  work 
place  will  yield  a  stratified  sample  of  firms 
where  the  probability  of  selection  is  inversely 
proportional  to  firm  size.  By  weighting  the 
number  of  firms  in  each  size  group,  estimates 
for  the  entire  population  of  firms  can  be 
derived.  The  sample  of  employers  would  be 
contained  in  a  single  data  set  versus  the 
diversity  of  data  sets  in  which  the  economic 
data  are  now  found,  with  the  same  format 
across  employers.  These  advantages  plus  the 
manageable  size  of  the  sample  should  provide 
valuable  insights  into  the  structure  of  produc- 
tion within  and  across  sectors  of  the  economy 
at  a  point  of  time  and  over  time. 

1.  SIPP  and  the  Economic  Data  Files  In 
merging  demographic  and  economic  data,  it  is 
necessary  to  know  the  information  contained 
in  the   various  files  to  be  linked  and  how 


each  file  is  constructed.  In  this  section,  we 
briefly  describe  four  data  sets  which  might  be 
incorporated  into  a  SIPP-economic  data  file. 

SIPP  contains  demographic  and  program  re- 
lated data.  Economic  data  are  found  in  the 
Standard  Statistical  Establishment  List  (SSEL), 
the  Longitudinal  Establishment  Data  (LED)  file, 
and  the  enterprise  statistics  (ES).  The  SSEL 
covers  all  establishments  and  companies  with 
employees  and  yields  current  information  on 
employment  and  payroll.  The  LED,  as  its  name 
implies,  contains  longitudinal  data  but  is 
restricted  to  manufacturing  establishments. 
The  ES,  on  the  other  hand,  covers  companies 
in  the  construction,  mineral,  manufacturing, 
wholesale  trade,  and  retail  trade  industries, 
and  most  service  industries. 

The  SSEL  is  a  complete  directory  of  estab- 
lishments in  single  and  multi-establ i shment 
enterprises  with  one  or  more  employees,  irre- 
spective of  industry.  The  SSEL  links  parent 
companies,  subsidiaries,  and  their  establish- 
ments. It  contains  information  on  approxi- 
mately 4.7  million  enterprises  and  5.7 
million  establishments. 

The  importance  of  the  SSEL  is  that  it  is 
a  current  file  containing  a  complete  list  of 
establishments  and  companies  with  paid  em- 
ployees. While  the  SSEL  contains  a  narrow 
range  of  economic  data,  these  data  impart 
valuable  information.  For  example,  the  SSEL 
contains  the  address  of  the  physical  location 
of  establishments  which  is  useful  for  merging 
the  demographic  and  economic  data,  since  it 
is  a  primary  link  in  identifying  an  indivi- 
dual's place  of  work.  Employment  and  payroll 
figures  yield  an  estimate  of  average  annual 
earnings,  thereby  indicating  whether  an  em- 
ployer is  a  low  or  high  wage  employer.  JL/ 
Sales  and  employment  figures  provide  a  proxy 
measure  of  productivity.  Operational  status 
information  can  be  utilized  to  identify  esta- 
blishments which  have  become  inactive.  Addi- 
tionally, the  SSEL  contains  longitudinal  in- 
formation. Currently,  establishment  and  com- 
pany data  are  carried  for  two  years  in  the 
SSEL. 

The  LED  is  a  longitudinal  micro-data  base 
containing  data  at  the  establishment  level  from 
the  Annual  Survey  of  Manufactures  and  the 
Census  of  Manufactures. 

The  LED  provides  a  much  broader  range  of 
information  about  establishments  than  the 
SSEL.  For  each  manufacturing  establishment, 
value  added  per  production  worker,  which  is  a 
measure  of  labor  productivity,  can  be  calcu- 
lated. For  the  larger  establishments  with 
250  or  more  workers,  information  is  available 
on  depreciable  assets  and  rented  machinery 
so  that  capital/labor  ratios  can  be  computed. 
Also,  a  better  measure  of  labor  compensation, 
including  fringe  benefits,  can  be  obtained. 

Like  the  Census  of  Manufacturers,  the 
enterprise  statistics   (ES)  are       collected 
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every  five  years.  These  data  cover  enterprises 
whose  primary  activity  is  in  an  in-scope  indus- 
try. For  each  enterprise,  the  data  are  consol- 
idated over  all  operating  units.  The  informa- 
tion contained  in  the  ES  is  similar  to  that  in 
the  Census  of  Manufactures,  except  that  fringe 
benefit,  asset,  and  related  data  are  available 
only  for  companies  with  500  or  more  workers. 

2.  Some  Applications  of  Micro- 
Demographic  and  Economic  Data  In  this 
section,  three  applications  of  a  SIPP- 
economic  data  file  are  discussed  to  illustrate 
how  this  data  set  can  help  explain  policy 
issued  relating  to  earnings  and  employment. 

A.  Low  Wage  Workers  and  Low  Mage  Firms 
While  survey  data  such  as  the  CPS  provide 
insights  into  the  characteristics  of  low 
wage  workers,  they  provide  no  information  about 
low  wage  firms.  Despite  this  lack  of  informa- 
tion, an  £  priori  description  of  low  and  high 
wage  firms  can  be  formulated.  All  else  being 
the  same,  low  wage  firms  will  be  labor  inten- 
sive and,  hence,  tend  to  be  smaller  than  high 
wage  firms.  And  because  recruitment  and  hiring 
cost  relative  to  the  level  of  wages  will  tend 
to  be  high,  low  wage  firms  will  also  advertise 
less  for  labor  and  employ  fewer  screening  de- 
vices to  weed  out  suitable  workers;  thus,  their 
work  force  will  be  of  lesser  quality  than  their 
high  wage  counterparts.  Less  qualified  work- 
ers, on  the  other  hand,  e.g.,  younger  workers 
and  those  who  are  less  educated,  will  be 
attracted  to  low  wage  firms  because  their 
marginal  product  is  less  than  that  required  to 
gain  employment  in  high  wage  firms.  More 
generally,  workers  with  given  characteristics 
and  tastes  sort  themselves  among  firms  with 
similar  requirements  for  labor. 

Corresponding  to  the  greater  prevalence 
of  low  quality  workers  in  low  wage  firms,  one 
might  expect  that  in  these  firms  (vis-a-vis 
high  wage  firms)  a  higher  proportion  of  capi- 
tal expenditures  is  for  used  rather  than  new 
machinery  and  equipment;  likewise,  the  pro- 
portion of  depreciable  assets  retired  each 
year  is  likely  to  be  smaller  in  such  firms. 
Furthermore,  given  that  labor  is  of  lesser 
quality  and  capital  is  of  an  older  vintage, 
it  would  not  be  surprising  if  value  added 
per  worker  were  relatively  low  in  low  wage 
firms. 

Other  characteristics  are  more  easily  seen 
by  focusing  on  high  wage  firms.  To  the  extent 
that  high  wage  firms  are  capital  intensive, 
their  need  for  trained  workers  is  likely  to  be 
greater  than  that  of  low  wage  firms.  Capital 
intensi veness  suggests  greater  use  of  resources 
to  monitor  output;  hence,  a  higher  proportion 
of  the  work  force  may  be  needed  in  supervisory 
positions.  To  reduce  turnover,  which  disrupts 
the  production  process,  high  wage  firms  will 
substitute  future  benefits  in  the  form  of 
pensions  for  current  benefits  in  the  form  of 
wages.  A  SIPP-economic  data  file  would  permit 
verification  of  these  hypotheses. 

Information  about  low  and  high  paying 
firms  is  important  for  another  reason  besides 
the  light  it  sheds  on  how  production  is  organ- 
ized in  these  two  types  of  firms.  Since  low 
paying  firms  are  a  source  of  employment  for 
workers  with  relatively  low  productivity,  it  is 


of  some  interest  to  inquire  into  the  extent  to 
which  low  pay  among  workers  is  attributable  to 
their  employment  in  such  firms.  In  approaching 
the  question  of  why  some  workers  are  paid  less 
than  others  in  this  manner,  low  wage  employers 
can  be  viewed  as  providing  employment  opportu- 
nities with  attendant  low  earnings,  not  because 
they  descriminate  against  certain  groups  of  in- 
dividuals, but  because  the  production  processes 
that  are  most  efficient  for  their  mode  of  oper- 
ation do  not  require  high  quality  labor  and, 
furthermore,  they  inhibit  their  paying  high 
wages. 

A  procedure  for  verifying  this  view  would  be 
to  sector  firms  according  to  whether  they  are 
low  paying  or  high  paying.  With  this  sectoring 
of  firms,  one  would  expect,  as  indicated  above, 
that  the  mix  of  workers  and  capital  is  dissimi- 
lar between  the  two  sectors.  Assuming  this  is 
so,  to  what  extent  are  differences  in  individ- 
ual earnings  in  low  and  high  paying  firms  due 
to  the  characteristics  of  the  workers  and  capi- 
tal employed  in  each  type  of  firm?  Also,  to 
what  extent  are  workers  with  similar  character- 
istics renumerated  in  the  same  way  in  each  type 
of  firm?  The  answers  to  these  questions  can  be 
obtained  from  a   SIPP-economic  data   file. 

B.  Structural  Unemployment  An  issue  of 
long  standing  is  what  happens  to  workers  who 
are  displaced  from  their  job  as  a  result  of 
structural  disequilibria.  How  long  do  they 
remain  unemployed  vis-a-vis  other  workers  who 
separate  from  an  employer?  What  sources  of 
income,  including  cash  and  noncash  government 
transfers,  do  they  draw  on  when  they  are  unable 
to  find  work?  When  they  find  a  job,  how  do 
earnings  in  the  new  job  compare  to  earnings  in 
the  old  one?  If  there  is  an  earnings  loss, 
how  much  of  this  loss  is  recouped,  say,  after  2 
years? 

A  major  problem  in  answering  these 
questions  is  that  workers  do  not  know  if  they 
are  structurally  unemployed.  One  way  of  iden- 
tifying such  workers  is  to  ascertain  what  has 
happened  to  the  firms  in  which  they  were  last 
employed. 

If  the  firm  has  undergone  a  substantial  de- 
cline in  employment  or  has  closed  down  for  a 
a  relatively  long  period  of  time,  say,  longer 
than  the  typical  recession,  one  may  presume 
that  it  has  undergone  a  shock  which  is  typi- 
cal of  the  shocks  experienced  by  firms  subject 
to  structural  disequilibria.  It  also  can  be 
presumed  that  the  employees  of  these  firms 
experience  the  aftereffects  of  such  shocks. 

A  SIPP-economic  data  file  would  enable  one 
to  determine  the  extent  to  which  firms  are 
subject  to  severe,  long-term  shocks  as  evi- 
denced by  plant  closures  and  substantial 
reductions  in  employment,  and  how  such  shocks 
affect  their  work  force. 

C.  High  Tech  Workers  and  High  Tech  Firms 
Despite  the  importance  of  new  technologies 

for  improving  productivity,  there  is  no  widely 
accepted  definition  of  a  high  tech  industry. 
Based  on  a  definition  which  includes  indus- 
tries with  a  ratio  of  technology-oriented 
workers  1!  to  all  workers  of  at  least  1.5 
times  the  industry-wide  average,  Riche,  Hecker, 
and  Burgan  (1983)  estimate  that  13.4  percent  of 
all  wage  and  salary  workers  were  employed  in 
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high  tech  industries  in  1982. 

High  tech  industries  have  been  cited  as 
having  a  large  group  of  high  and  low  wage  work- 
ers whereas  other  industries  are  comprised 
of  workers  who  are  concentrated  in  the  middle 
of  the  earnings  distribution.  While  it  is  use- 
ful to  know  how  workers  in  high  tech  and  other 
industries  differ  and  the  differential  growth 
of  employment  in  the  two  kinds  of  industries, 
it  is  equally  important  to  know  the  character- 
istics which  differentiate  high  tech  from  other 
firms  and  the  differential  in  the  rate  of 
growth  of  the  two  types  of  firms. 

As  is  self-evident,  not  all  firms  in  high 
tech  industries  utilize  the  latest  technology, 
and  new  techniques  of  production  are  utilized 
by  firms  in  industries  besides  those  labeled 
as  high  tech.  One  approach  to  distinguishing 
between  the  two  types  of  firms  would  be  to  com- 
pare the  characteristics  of  the  industries 
denoted  on  a_  priori  grounds  as  high  tech  with 
other  industries  and  then  to  use  this  informa- 
tion to  identify  high  tech  firms.  To  illus- 
trate this  approach,  assume  that  the  a  priori 
criterion  used  to  denote  high  tech  industries 
is  the  one  noted  above,  namely,  that  the  ratio 
of  high  tech  to  all  workers  in  a  given  industry 
to  the  similar  ratio  for  all  industries  is 
higher  than  some  minimum  value.  Assume  also 
that  the  high  tech  industries  exhibit  high 
values  of  the  following  ratios:  capital  expen- 
ditures for  new  computers  to  al  1  capital  expen- 
ditures, capital  expenditures  to  asset  value, 
and  capital  to  labor.  Given  a  set  of  charac- 
teristics which  permit  the  bifurcation  of 
industries,  the  multivariate  technique  of 
cluster  analysis  can  then  be  applied  to  iden- 
tify high  tech  firms  within  both  high  tech 
and  other  industries. 

The  outcome  of  the  cluster  analysis  is  a 
partitioning  of  firms  into  categories,  i.e., 
high  tech  and  nonhigh  tech  firms,  as  determined 
by  the  data,  where  each  cluster  of  firms  repre- 
sents a  homogeneous  set  of  observations.  An 
advantage  of  applying  the  aforementioned  two- 
stage  procedure  using  a  SIPP-economic  data  file 
is  that  it  provides  an  independent  test  of  how 
well  the  procedure  works.  For  if  the  approach 
is  successful,  the  proportion  of  workers  who 
are  technology-oriented  among  the  firms  classi- 
fied as  high  tech  (taken  as  a  group)  will  be 
higher  than  the  similar  proportion  for  firms 
classified  as  nonhigh  tech  (again,  taken  as 
a  group),  and  the  difference  in  proportions 
will  be  greater  than  the  corresponding  differ- 
ence when  industries  » are  classified  as  high 
tech  and  nonhigh  tech. 

Having  identified  high  tech  firms,  in  con- 
trast to  high  tech  industries,  insights  can 
then  be  obtained  as  to  how  production  processes 
in  these  firms  differ  from  their  nonhign  tech 
counterparts.  At  the  same  time,  it  will  enable 
one  to  better  define  high  tech  occupations  and 
how  workers  in  these  (and  other)  occupations  in 
high  tech  firms  differ  from  similar  workers  in 
nonhigh  tech  firms. 

3.  The  Pilot  Study  A  principal  part  of 
the  pilot  study  is  designed  to  assess  the 
availability,  sources,  coverage,  and  content 
of  the  various  economic  data  files  maintained 
by  the  Bureau  of  the  Census  and  to  explore 


study  areas  and  issues  to  which  a  data  set  com- 
bining micro-worker  and  firm  data  would  be  ap- 
plied. In  the  course  of  this  study,  specific 
demographic  and  economic  variables  have  been 
identified  which  should  be  incorporated  into 
such  a  data  set.  Additionally,  it  was  antici- 
pated that  methodological  problems  inherent  in 
this  undertaking  would  be  revealed;  indeed, 
this  has  been  the  case. 

A  second  phase  of  the  pilot  study  is  to 
investigate  the  efficiency  of  four  alternative 
methods  of  identifying  an  individual's  employ- 
er. Each  method  is  based  on  different  infor- 
mation for  searching  the  SSEL  and  identifying 
the  employer's  census  file  number  (CFN).  The 
first  utilizes  information  on  employer  name, 
the  state  of  residence  and/or  zip  code  of  the 
employee,  and  census  industry  code.  The  same 
information  is  used  in  the  second  method;  how- 
ever, additional  reference  materials,  ^e.g. , 
1980  Census  Company  Name  and  Place  of  Work 
lists,  Dun  and  Bradstreet  reference  books, 
Standard  and  Poor  directories,  and  telephone 
books,  will  be  used  to  obtain  the  exact  address 
of  an  individual's  employer.  The  third  method 
uses  the  employer's  name  and  exact  address  if 
known.  In  the  last  method,  if  the  employer's 
identification  number  (EIN)  is  known,  it  is 
used  in  conjunction  with  the  information  avail- 
able in  the  first  three  methods  to  identify  the 
employer's  CFN.  For  each  method,  match  rates 
and  cost  information  will  be  developed  for  a 
small  sample  of  workers. 

A  third  phase  of  the  study  is  the  construc- 
tion of  a  pilot  SIPP-economic  data  file  in 
which  the  SIPP  portion  of  the  file  would  be  re- 
stricted to  full -time  workers  in  large  manufac- 
turing establishments;  the  source  of  the  eco- 
nomic data  would  be  the  LED.  The  objective  in 
this  phase  is  to  calculate  match  rates  between 
workers  in  SIPP  and  their  establishments  in  the 
LED. 

Given  the  importance  of  the  wage  determina-. 
tion  process,  one  of  the  areas  noted  above, 
e.g.,  low  wage  workers  and  low  wage  firms, 
would  be  studied  when  the  pilot  work  file  is 
completed.  Demonstration  of  the  utility  of 
this  research  endeavor  in  terms  of  its  contri- 
bution to  the  economic  literature  would  con- 
stitute the  final  phase  of  the  pilot  study. 

4.  Methodological  Problems  in  Matching 
Demographic  and  Economic  Data  In  this  sec- 
tion, attention  is  focused  on  two  methodolo- 
gical problems.  The  first  problem  deals  with 
procedures  for  tying  workers  to  their  estab- 
lishment and  company.  The  second  relates  to 
the  estimation  of  data,  in  particular,  asset 
and  fringe  benefit  data,  which  although  avail- 
able for  large  establishments  and  companies, 
are  generally  not  collected  for  small  ones. 

Central  to  the  creation  of  a  SIPP-economic 
data  file  is  the  ability  to  determine  the  esta- 
blishment and/or  company  in  which  a  person  is 
employed.  The  most  promising  and  least  expen- 
sive way  of  doing  this  is  to  match  on  firm 
name  and  physical  address  of  an  individual's 
place  of  work.  This  information  will  be  avail- 
able in  SIPP  and  is  available  in  the  SSEL. 
Although  the  physical  address  is  not  necessary 
for  identification  of  an  individual's  work 
place,  its  availability  greatly   facilitates 
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such  identification  since  a  firm  may  have  more 
than  one  establishment  in  a  local  area. 

For  employers  with  only  one  establishment  in 
an  area,  the  firm  name  and  employee's  address 
will  typically  be  sufficient  to  determine  where 
a  person  is  employed.  As  noted,  for  companies 
with  more  than  one  establishment  in  an  area, 
the  firm  name  and  employer's  address  should  be 
sufficient  to  identify  the  place  of  work.  If 
an  employer  has  more  than  one  establishment 
in  an  area  and  the  place  of  work  cannot  be 
determined  using  the  employer's  physical 
address,  or  no  address  is  available,  other  in- 
formation in  SIPP  can  be  utilized.  Firm  name, 
respondent's  address,  census  industry  code,  and 
respondent's  estimate  of  size  of  establishment 
and  company  can  be  used  to  identify  a  person's 
work  place.  For  example,  it  is  unlikely  that  a 
firm  manufacturing  bottles  will  have  more  than 
one  large  plant  in  a  local  area. 

Another  aid  in  identifying  an  individual's 
work  place  is  the  EIN.  While  a  company  may 
have  a  number  of  establishments  in  a  local 
area,  its  subsidiaries,  when  identified  by 
their  own  EIN,  may  have  only  one  establishment 
in  the  area.  Thus,  the  EIN  of  the  employer  for 
whom  an  individual  works  can  be  sufficient  to 
uniquely  determine  the  establishment  in  which 
that  person  is  employed. 

In  the  event  that  an  unique  work  place  can- 
not be  determined  for  a  multi-establishment 
firm,  the  employer's  characteristics  can  be 
imputed.  Data  from  the  SSEL  on  number  of 
employees  and  payroll  can  be  averaged  over  a 
company's  establishments  in  a  local  area.  From 
the  ES  file,  average  values  can  be  computed 
for  variables  not  contained  in  the  SSEL.  For 
example,  the  average  capital/labor  ratio  for  a 
company  with  a  chain  of  fast-food  stores  can  be 
used  as  an  estimate  of  the  capita/labor  ratio 
for  each  store  in  the  chain. 

Where  it  is  not  possible  to  identify  a  work- 
er's firm  by  name  in  the  SSEL,  imputations  can 
be  made  by  averaging  over  establishments  in  the 
same  local  area  and  with  the  same  census  indus- 
try code  as  that  of  the  given  employer.  Addi- 
tionally, it  may  be  possible  to  refine  the 
imputation  process  by  considering  information 
contained  in  SIPP,  e.g.,  the  size  of  establish- 
ment in  which  an  individual  works  and  whether 
the  firm  has  one  or  more  than  one  establish- 
ment. 

As  indicated,  information  on  assets  and 
fringe  benefits  is  not  generally  available  for 
small  establishments,  but  such  information  is 
available  for  a  large  sample  of  small  esta- 
blishments in  manufacturing.  Despite  the  fact 
that  asset  information  is  not  collected  for 
many  of  the  firms  in  which  individuals  work, 
the  use  of  an  economic  model,  including  indus- 
try, firm  size,  and  other  variables,  may  enable 
one  to  obtain  reasonably  accurate  estimates 
of  capital  for  small  establishments. 

Economic  theory  sugests  a  number  of  rela- 
tionships which  influence  the  amount  of  capi- 
tal that   a  firm   employs  in   its   production 


process.  In  particular,  since  capital  inten- 
sity varies  with  establishment  size  in  closely 
related  industries,  it  seems  reasonable  to 
assume  that  information  about  the  number  of 
employees  in  an  establishment  can  be  used  to 
further  refine  estimates  of  its  capital  assets. 
All  else  being  the  same,  one  would  expect  the 
smaller  an  establishment,  the  lower  would 
be.  its  capital/labor  ratio.  II  Additionally, 
holding  everything  else  constant,  including 
establishment  size,  low  wage  establishments 
will  substitute  labor  for  capital  in  order  to 
economize  on  the  use  of  the  relatively  expen- 
sive factor,  i.e.,  capital.  Thus,  low  wage 
establishments  will  tend  to  have  a  lower  capi- 
tal/labor ratio  than  high  wage  establishments. 

Even  among  establishments  of  the  same  size 
whose  wage  rate  is  also  the  same,  one  would 
expect  a  lower  capital/labor  ratio  the  higher 
the  proportion  of  production  workers  among  all 
workers.  When  the  proportion  of  production 
workers  among  all  workers  is  high,  or  converse- 
ly, when  the  percentage  of  workers  who  super- 
vise production  is  low,  this  comes  about  be- 
cause a  firm  has  few  assets,  relative  to  labor, 
to  monitor.  Additional  relationships  between 
assets  and  other  variables  may  exist.  For 
example,  it  may  be  that  newer  establishments  in 
an  industry  are  more  capital  intensive  than 
older  ones;  likewise,  regional  variations  in 
entrepreneurial  ability  may  give  rise  to  cor- 
responding variations  in  capital  intensity. 

Besides  economic  relationships,  engineering 
relationships  also  may  be  useful  in  estimating 
capital  intensity.  For  example,  it  is  plau- 
sible that  an  establishment's  capital/labor 
ratio  is  positively  related  to  purchased 
electricity  per  employee. 

Finally,  an  economic  model  can  also  be  util- 
ized to  estimate  fringe  benefits  for  small 
establishments  and  small  companies.  It  is 
plausible  to  assume  that  fringe  benefits  in  a 
firm  are  related  to  its  size,  average  wage 
level,  legal  form  of  organization,  industry, 
and  region  where  it  is  located.  With  a  SIPP- 
economic  data  file  more  refined  estimates  of 
fringe  benefits  per  employee  can  then  be  ob- 
tained by  taking  account  of  the  percentage  of 
employees  who  are  covered  by  life  and  medical 
insurance  and  a  private  pension  plan  in  a 
given  group  of  firms,  say,  (small)  high  paying 
establishments  in  manufacturing.  Given  this 
information,  the  average  value  of  these  bene- 
fits per  covered  and  noncovered  worker  can  be 
calculated  for  each  establishment  in  the  group. 

An  economic  model  can  also  be  developed  to 
estimate  fringe  benefits  per  covered  and  non- 
covered  worker  in  small  establishments  and 
companies.  With  appropriate  information  in 
SIPP,  such  estimates  could  provide  a  basis  for 
imputing  an  important  component  of  private  non- 
cash benefits  to  individual  workers.  Although 
it  should  be  evident  from  the  discussion  of 
this  paper,  this  last  illustration  is  indica- 
tive of  the  benefits  to  be  derived  from  a  SIPP- 
economic  data  file. 


FOOTNOTES 

1/  The  data  referenced  in  this  section  as 
welT  as  the  remainder  of  the  paper  are  avail- 
able in  the  economic  and  (where  applicable)  in 
the  SIPP  data  collected  by  the  Bureau  of  the 
Census. 

2/  Defined  as  engineers,  life  and  physical 
scientists,  mathematical  scientists,  engineer- 
ing and  science  technicians,  and  computer 
specialists. 

3/  An  estimate  of  a  firm's  assets  can  then  be 
obtained  by  multiplying  the  capital/labor  ratio 
estimate  by  the  number  of  workers  in  its  employ. 
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1.  Some  Problems  Common  to  All  Three  Papers 

Selection 

The  SIPP  is  subject  to  a  different  kind  of  se- 
lectivity than  earlier  cross  section  surveys  with 
which  we  are  familiar.  Understanding  that  selec- 
tivity is  an  important  part  of  the  research  that 
must  be  carried  out  on  SIPP  before  we  can  be  com- 
fortable with  its  results.  The  selectivity  that 
affects  migration  and  labor  force  measurement 
comes  from  the  following  rules,  attrition,  rules 
for  handling  first  wave  household  non-response, 
and  treatment  of  the  sample  universe.  The  selec- 
tivity that  affects  matching  comes  largely  from 
the  design  of  economic  data  samples.  Each  of 
these  problems  will  be  discussed  below. 

Truncation  of  Measures 

A  number  of  devices  in  the  questionnaire  result 
in  logical  exclusion  of  certain  kinds  of  data. 
The  respondents  report  detail  only  on  two  em- 
ployers; student  status  is  revealed  only  for 
some  persons;  income  is  fully  reported  only  for 
some  children.  This  truncation  is  highly  perti- 
nent to  migration  and  labor  force  measures;  less 
important  for  matching. 

Attitudinal  measures 

The  SIPP  is  almost  entirely  lacking  in  attitu- 
dinal measures.  We  need  to  ask  to  what  degree 
subjective  measures  are  important  for  the  three 
areas  covered  by  the  papers  above. 

Left  censoring 

Although  the  SIPP  records  much  current  informa- 
tion, the  duration  of  status  at  Wave  1  is  conspi- 
cuously lacking.  We  do  not  know  for  how  long  the 
respondent  was  (un)employed  prior  to  the  first 
interview;  for  how  long  (s)he  has  held  the  cur- 
rent position;  or  for  how  long  (s)he  has  resided 
in  the  current  residence.  Each  of  these  items 
must  be  known  to  understand  episodes  or  events  in 
the  lives  of  real  people  and  will  be  important 
conditioning  information  for  analyses  of  SIPP. 

2.  Issues  pertaining  to  Migration 
Dahmann  correctly  stresses  the  unique  potenti- 
alities of  SIPP  for  migration  statistics.  Unlike 
earlier  devices  for  measuring  migration,  the  SIPP 
gives  high  resolution  to  the  timing  of  migration 
•vents,  and  associated  economic  and  demographic 
circumstances  before  and  after  the  migration 
event . 

By  contrast  retrospective  reports  of  migration 
•re  severely  biased,  because  measures  of  migra- 
tion are  only  obtained  for  those  individuals  who 
remain  in  the  household  universe.  Persons  who 
•migrate  and  persons  who  die  are  excluded.  If 
their  migration  experiences  are  different  from 
the  remaining  household  population,  retrospective 
measures  are  distorted. 

Unfortunately  SIPP  is  not  •  perfect  instrument 
for  recording  migration  because  the  sample  sys- 
tematically censors  certain  types  of  migration. 
Firstly,  the  sample  is  not  updated  with  new  ad- 
dresses until  after  12  months  of  operation.  Thus 
Immigrants  to  the  household  population  are  not 


represented,  unless  they  move  into  existing 
households.  This  contrasts  with  the  design  of 
the  CPS  where  the  sample  Is  renewed  (a)  by  the 
inclusion  of  a  new  rotation  in  each  month  and  (b) 
by  the  turnover  of  families  in  dwellings  which 
results  in  new  households  being  sampled  at  old 
addresses.  One  hopes  that  the  Bureau  will  esti- 
mate the  extent  of  this  problem,  which  affects 
both  immigrants  to  the  OS  population  and  immi- 
grants to  the  household  universe  in  the  12-month 
period  between  additions  to  the  panel  addresses. 

A  second  problem  results  from  following  rules. 
Movers  are  followed  to  locations  within  100  miles 
of  the  sample  PSUs.  This  rule  covers  95J  of  the 
population,  but  clearly  selects  particular  types 
of  moves.  The  city  dwellers  who  tire  of  urban 
life  are  less  well  represented  than  persons  relo- 
cating because  of  new  employment  possibilities. 
The  Bureau  needs  to  assess  the  impact  of  alterna- 
tive following  rules  on  the  types  of  migration 
represented.  CPS  excludes  all  migration  events; 
ISDP  follows  migration  to  the  50-mile  radius; 
SIPP  follows  more  extensively— These  rules  should 
be  simulated  on  the  retrospective  reports  of  mi- 
gration so  that  differences  between  the  measures 
can  be  understood  as  differences  in  the  popula- 
tion sampled  and  differences  in  response  for 
identical  populations.  We  need  a  careful  evalua- 
tion. 

The  following  rules  also  preclude  the  following 
of  sample  members  into  the  institutional  popula- 
tion. This  rule  is  at  odds  with  the  principle 
that  SIPP  is  a  longitudinal  panel  of  individuals 
living  at  selected  addresses  in  Wave  1.  It  is 
understandable  to  question  whether  the  instrument 
designed  for  the  household  population  can  be  ap- 
plied in  other  settings.  However,  some  of  the 
most  important  economic  transitions  may  accompany 
shifts  out  of  the  household  universe.  The  design 
of  SIPP  needs  to  be  alert  to  that  possibility. 
It  must  also  permit  measurement  of  persons 
re-entering  the  household  universe  during  the 
life  of  the  panel. 

I  heartily  support  Dahmann 's  plea  for  a  special 
module  of  questions  that  is  triggered  by  observed 
migration.  We  need  to  know  the  proximate  causes 
of  migration,  the  costs  associated  with  moves, 
and  planning  for  moves.  No  vehicle  could  be  more 
appropriate  for  such  questions  than  SIPP.  It  has 
a  sufficient  sample  size  so  that  the  migrating 
population  is  large  enough  to  study,  and  it  can 
measure  attitudes  about  migration  at  a  time  when 
people  can  easily  remember. 


3.  Labor  Force  Measurements 
Selection 

The  SIPP  will  be  less  selective  of  longitudinal 
labor  force  measures  than  matched  CPS  samples 
precisely  because  the  CPS  does  not  follow  any  mo- 
vers. (In  addition,  CPS  does  not  maintain  ade- 
quate control  over  the  identities  of  individuals 
so  that  matching  is  inferential,  rather  than  po- 
sitively based  on  an  identifier  such  as  the  So- 
cial Security  number.)  Nonetheless  selection  re- 
mains a  problem  for  SIPP.  The  failure  to  sample 
immigrants  mentioned  above,  and  the  failure  to 
follow  household  non-response  from  the  first  wave 
both  cause  systematic  selection  from  the  house- 
hold population  at  later  points  in  time. 
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The  designers  of  the  PSID  express  as  their  gre- 
atest regret  that  they  did  not  atteapt  to  follow 
refusals  and  obtain  conversions.  Since  we  do  not 
have  any  data  on  conversion  rates  following  a 
four-month  lapse  of  time,  it  is  hard  to  judge  the 
success  of  such  an  effort.  It  is  clear,  however, 
that  persons  who  refuse  at  any  time  are  a  portion 
of  the  population  that  requires  diligence  to  sort 
out  because  they  are  likely  to  differ  from  coo- 
perative respondents. 

Truncation 

In  order  to  understand  the  implications  of  the 
complex  skip  structure  of  the  labor  force  ques- 
tions I  tested  the  sequence  on  the  assumption 
that  the  respondent  is  a  student  interviewed  in 
May  at  the  end  of  the  school  year.  If  he  was  not 
in  the  labor  force  during  the  reference  period, 
there  is  no  place  to  record  student  status.  If 
the  student  had  any  work  during  the  school  term 
and  was  not  underemployed,  school  status  is  not 
revealed.  Only  if  additional  work  is  desired  and 
unavailable  do  we  discover  student  status  as  a 
rationalization  for  not  working  and  not  looking 
for  work.  This  asymmetry  will  make  it  difficult 
to  interpret  information  on  youth. 

Another  instance  of  truncation  is  revealed  by 
the  interviewer  instructions.  Reports  of  persons 
engaged  in  unpaid  work  are  recorded  only  when 
that  effort  is  expended  in  connection  with  the 
business  of  a  family  member.  Work  for  relatives 
outside  the  household  and  voluntarism  (for  any 
organization)  is  not  captured. 

These  problems  are  mentioned  because  they  point 
to  a  set  of  conventions  established  in  connection 
with  CPS  data  that  need  not  be  blindly  applied  to 
an  innovative  survey  such  as  SIPP. 

The  purpose  of  SIPP  is  to  understand  broadly 
about  the  economic  roles  of  individuals  and  the 
relationship  of  those  roles  to  Federal  and  State 
programs  for  income  protection.  It  is  widely  re- 
oognlzed  that  a  student  role  Is  vital  to  protect- 
ing the  capacity  of  our  economy  to  produce  in  the 
future,  and  that  individuals  investing  in  train- 
ing are  most  often  contributing  to  their  own  fu- 
ture economic  well-being.  It  does  not  make  sense 
to  construe  SIPP  so  narrowly  that  student  roles 
can  be  understood  both  as  complements  to  work 
roles  and  as  an  exclusive  status. 

unpaid  work  roles  are  also  non  trivial. 
Persons  engaged  in  those  roles  obtain  skills  and 
satisfaction  and  contribute  to  the  production  of 
the  economy.  We  should  not  narrowly  construct  a 
notion  of  socially  productive  roles  so  that  these 
activities  are  not  properly  recorded.  SIPP  could 
do  better,  without  a  major  Investment  in  addi- 
tional measurements. 

I  urge  the  designers  of  SIPP  to  reconsider  the 
labor  force  sequence  in  this  broader  setting— We 
want  to  know  the  contributors  and  the  dependents 
of  the  society,  not  Just  the  narrow  answer  to  the 
question:  "Did  you  receive  pay  for  employment 
last  week?" 

Two  other  truncation  issues  should  be  noted. 
Omission  of  employer  related  data  for  the  third 
employer  during  the  reference  period  distorts  the 
employment  problems  of  a  group  of  high  turnover 
employees  that  we  need  to  know  more  about.  The 
Bureau  needs  to  Justify  this  arbitrary  truncation 
of  the  data  set.  Secondly,  and  we  have  already 
mentioned  its  importance,  the  failure  to  deter- 


mine the  duration  of  labor  force  status  at  the 
time  of  the  first  wave  of  interviewing  results  in 
left-censoring  of  all  periods  of  unemployment. 
This  is  most  unfortunate  for  the  study  of  spells 
of  unemployment,  which  are  of  great  interest  to 
the  program  agencies  providing  unemployment  com- 
pensation and  other  income  support. 

Response  bias 

Ryscavage  stresses  the  fact  that  comparison  of 
the  1984  and  1985  panels  will  expose  bias  due  to 
months  in  sample.  This  is  far  less  important 
than  the  recall  biases  that  can  be  studied  within 
the  rotations  of  each  panel.  Ryscavage  points  to 
work  suggesting  little  problem  with  recall  bias 
in  the  annual  labor  market  experience  measures. 
That  fact  may  not  be  relevant  to  the  cognitive 
task  facing  individuals  in  the  SIPP. 

The  year  is  a  well-understood  measure  of  time. 
Many  activities  relate  to  an  annual  cycle.  We 
have  many  assists  from  nature,  tax  records,  em- 
ployment rating  cycles,  and  other  events  that 
help  us  structure  an  answer  to  a  question  like 
"How  many  weeks  did  you  work  last  year?".  Most 
of  these  assists  are  lacking  for  the  SIPP  refer- 
ence period.  That  fact  implies  that  each  respon- 
dent must  laboriously  reconstruct  the  last  18 
weeks  to  answer  the  labor  force  questions.  For 
persons  with  intermittent  employment  this  may  be 
difficult.  I  would  argue  that  SIPP  places  a  much 
larger  cognitive  burden  on  the  respondent  than 
the  CPS  reports.  This  will  be  reflected  in  a  po- 
orer report  of  events  more  distant  in  the  past, 
and  that  response  bias  needs  to  be  estimated  and 
reported.  I  do  not  believe  that  we  can  interpo- 
late extent  of  response  errors  from  past  observa- 
tions of  annual  experience  in  relation  to  the  CPS 
measure  of  labor  force  from  the  monthly  survey. 

The  Four-month  Report 

While  it  is  true  that  data  are  collected  for  a 
four-month  reference  period,  it  is  not  necessary 
to  constrain  the  report  from  SIPP  to  a  four-month 
report  of  labor  market  experience,  as  Ryscavage 
suggests.  Indeed  the  likely  recall  problems  with 
dating  events  four  months  ago,  make  it  imperative 
to  study  reports  for  the  most  recent  month. 
Indeed  for  each  interview  there  is  one  report  of 
activity  during  the  CPS  reference  week  of  the 
prior  month.  A  first  task  for  SIPP  is  to  vali- 
date its  measures  against  CPS  for  that  week. 

4.  Matching  to  Economic  Data 
Haber  and  his  colleagues  make  it  clear  that  a 
limited  set  of  economic  variables  for  each  estab- 
lishment can  be  matched  to  every  worker,  and  a 
large  set  of  variables  match  to  those  workers  who 
are  employed  in  large  firms  and  particular  indus- 
tries. The  latter  data  are  updated  annually, 
while  data  on  small  establishments  come  from  per- 
iodic Censuses  (selection  in  time)  or  samples 
(seleotion  of  firms).  There  appear  to  be  five 
classes  of  potential  information  in  Table  1. 

Overlaid  on  this  censoring  of  the  available 
data  is  a  problem  of  incomplete  linking  data  in 
table  2. 

Haber  appears  to  be  on  sound  ground  in  advocat- 
ing analyses  of  labor  supply  using  information  on 
firm  payrolls,  size  of  work  force,  and  fringe  be- 
nefits. However,  his  suggestion  that  assets  and 
capital  should  be  imputed  for  small  firms  where 


only  SSEL  data  are 

available 

is  methodologically 

unsound.  No  aaount  of  imputation  will  establish 

the  covarianeea  between  worker  and  firs  charac- 

teristics that  are 

censored , 
Table  1 

Characteristic  of 

Data 

employer 

(updating  cycle, 

Industry 

Size 

in  years) 

1.  All 

All 

SSELO) 

2.  Mfg. 

NJ250 

ASMO) 

3. 

25HNja 

CM(5) 

*. 

a+UN 

none 

5.  In-scope 

All 

EC(5) 

NJ500 


Enterprise 
questionnaire 

(N  is  the  number  of  employees; 

a,  an  unknown  parameter.) 

Table  2 
Linking  data  in  SIPP 


BIN 
yes 


yes 

no 


Aggregation 

required  to 

impute  data 

none 

none 

enterprise  level 

2-digit  SIC  X  N 


D. 

by  the  lack  of  information  in  lines  2.-6.  In 
Table  1  for  some  establishments.  The  only  possi- 
ble procedure  is  to  study  the  nature  of  the  se- 
lection to  determine  probability  of  inclusion  for 
firms  with  different  characteristics  (Little  et 
al.  1983). 

Access  to  linked  data 

One  aspect  of  the  linked  file  that  deserves 
careful  thought  is  access  by  researchers  outside 
the  Census  Bureau.  It  is  clear  that  there  is 
such  broad  scope  for  research  using  a  linked  file 
that  many  students  of  industrial  organization  and 
labor  economics  will  wish  to  study  the  available 
data.  Monahan  (1983)  has  set  out  a  model  for 
public  access  using  the  LED.  That  model  should 
be  extended  to  all  SIPP  and  ISDP  administrative 
record  linkages.  The  model  for  general  access 
should  include,  in  addition:  A.  On-line  access 
by  the  public  to  a  test  sample  (which  could  be 
transformed  from  real  data  by  the  addition  of 
noise  and/or  the  reorganization  of  data  from  real 
firms  into  simulated  observations  obtained  by 
splicing  vectors  from  several  cases.  B.Use  of  a 
widely  available  statistical  package  as  an  inter- 
face that  users  can  manipulate  to  frame  their 
questions . 

5.  Optimizing  the  SIPP  Resource 
SIPP  is  still  developing  as  a  data  resource. 
These  papers  indicate  that  conceptual  effort  ex- 
pended on  exploiting  the  resource  will  pay  off 
handsomely.  The  suggestion  that  SIPP  be  enhanced 
with  additional  questions  that  are  conditional  on 
ehange  is  extremely  important.  Hot  only  ques- 
tions that  are  aaked  following  migration,  but 
also  questions  that  are  asked  following  change  in 
employers,  change  in  family  structure,  and  entry 
into  the  universe  should  be  incorporated  into  the 
standard  core  questions.  The  Bureau  is  consider- 
ing an  "exit"  interview  to  be  administered  to 
persons  moving  out  of  the  sample  universe;  this 
will  be  extremely  important  both  for  following 
movers  and  for  understanding  the  population  on 


whom  we  have  data.  I  should  also  imagine  that 
agencies  would  be  interested  in  questions  that 
are  conditional  on  movement  on  or  off  social  wel- 
fare programs  because  that  information  would  pro- 
vide better  clues  as  to  the  incentive  effects  of 
those  programs  and  the  processes  by  which  indivi- 
duals cope  with  crises  in  their  own  lives. 

I  strongly  believe  that  the  Bureau  should  con- 
sider extending  the  SIPP  panel  beyond  8  waves  of 
interviewing.  A  probability  sample  of  the  indi- 
viduals in  each  panel  could  be  selected  to  be 
continued  for  an  additional  thirty  months.  If 
selections  are  made  to  include  all  the  poor,  per- 
sons in  families  undergoing  demographic  change, 
persons  with  unemployment,  then  the  continuing 
panel  would  be  substantially  enriched  with  per- 
sons undergoing  changes  that  are  of  interest  to 
the  social  welfare  and  tax  programs.  The  exten- 
sion of  the  sample  to  60  months  provides  a  suffi- 
cient set  of  data  that  meaningful  before-after 
studies  can  be  made  in  relation  to  "rare  events 
such  as  divoroe,  movement  off  the  AFDC  roles,  or 
formation  of  a  new  household. 

It  seems  most  desirable  to  continue  discussions 
of  the  structure  of  SIPP  and  propose  conditional 
questions,  panel  extension,  and  linkage  to  Census 
and  administrative  data  as  substantial  enhance- 
ments to  the  current  design. 
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DISCUSSION 
Harold  Watts,  Columbia  University 


The  McMillen-Herriot  paper  presents  a  very 
valuable  review  and  explorers  guide  for  a  new  and 
relatively  uncharted  area.  It  is  true  that  panel 
data  have  been  available  for  some  time  for  house- 
hold samples  (15  years  or  more)  and  that  a  liter- 
ature on  the  topic,  which  is  cited  by  the  authors, 
goes  back  some  5  or  6  years.  But  the  problem  is 
difficult;  and  statistical,  cross-sectional  ways 
of  thinking  so  dominate  our  approaches  that  a 
recognized  solution  to  the  longitudinal  household 
problem  has  yet  to  be  found. 

Half  of  the  two-part  solution  proposed  here 
for  SIPP  has  been  implemented  by  the  Panel  Study 
of  Income  Dynamics  (PSID).  Both  logic  and  exper- 
ience have  supported  the  usefulness  of  the  "attri- 
bute" treatment  of  household  variables.  Individ- 
uals do  migrate  through  a  succession  of  house- 
holds and  treating  the  characteristics  of  the 
current  household  as  an  individual  time-varying 
attribute  provides  a  necessary  perspective  for 
analyzing  persons  longitudinally.  But  it  does 
not  solve  the  problem  of  continuity  for  a  house- 
hold. The  tentative  solution  offered  here  for 
that  task  distinguishes  between  family  and  non- 
family  households,  imposes  discontinuity  whenever 
a  unit  changes  from  family  to  nonfamily  status, 
and  recognizes  at  most  one  continuation  of  a  fis- 
sioning household.  This  definition  seems  plaus- 
ible and  viable,  but  clearly  it  will  take  some 
time  until  we  have  enough  experience  with  it  to 
make  a  judgement  about  how  useful  it  is,  or  even 
about  how  useful  a  longitudinal  household  is 
once  we  fully  digest  how  essentially  ephemeral 
it  is. 

In  this  regard,  I  would  like  to  reiterate  a 
plea  that  we  more  carefully  distinguish  between 
"household"  or  "co-resident  family  household" 
and  "family."  The  first  two  are  ephemeral  rela- 
tive to  individual  lifetimes,  at  least  in  our 
society.  But  a  family,  considered  as  a  web  of 
kinship  and  socioeconomic  ties  and  obligations, 
spans  generations  and  lifetimes.  I  am  not  yet 
arguing  that  we  should  try  to  measure  longitu- 
dinal family  units  in  this  dynastic  sense--but 
lets  not  call  something  else  a  "family"  just 
because  we  can  observe  it. 

In  terms  of  defining  Income  for  longitudinal 
household  units,  I  am  relatively  confident  that 
satisfactory  solutions  can  be  found  as  long  as 
we  recognize  that  the  income  picture  1s  a  lot 
more  complicated  than  any  "Annual  Income"  can 
begin  to  describe.  Income  1_s  a  flow  concept, 
and  we  can  express  any  measure  of  flow  as  an 
average  annual  rate  no  matter  how  long  or  short 
the  period  over  which  it  has  been  observed.  In 
order  to  aggregate  properly,  we  have  to  Inte- 
grate the  rates  with  the  durations,  and  avoid 
double  counting,  but  that  seems  as  tractlble  as 
other  things  we  now  do. 

The  more  challenging  problem  is  one  of  de- 
scribing aspects  of  the  Income  flow  that  have 
been  totally  Inaccessible  to  the  CPS— trends 
and  Instability  1n  the  rate  of  flow  are  quite 
important  for  the  behavior  and  welfare  of  both 
Individuals  and  households  but  are  totally 
obscured  1n  an  annual  Income  figure.  Con- 
sequently, I  do  not  feel  that  problems  related 
to  mimicking  that  very  limited  description  of 
Income  that  we  have  been  stuck  with  for  many 
years  should  have  much  impact  on  the  way  we 


choose  to  define  households  and  describe  their 
income  experience  over  time. 

Turning  to  the  McNeil-Salvo  paper  we  find  a 
new  and  Interesting  installment  of  the  estimat- 
ed-earnings-function  saga.  It  provides  us  with 
a  new  estimate  of  the  amount  of  the  male/female 
earnings  differential  that  can  be  accounted  for 
by  the  shorter  experience  and  skill-depreciation 
that  results  from  intervals  out  of  the  labor 
force.  As  usual,  a  relatively  small  part  of  the 
total  variation  in  earning  rates  is  accounted  for 
but  experience  and  schooling  seem  to  have  sub- 
stantial and  fairly  stable  effects  for  both  males 
and  females. 

As  measured  by  the  authors  only  a  small  part 
of  the  male/female  earnings  differential  can  be 
attributed  to  the  difference  in  work  interrup- 
tions and  consequent  reduction  in  experience 
that  characterizes  the  female  work  force.  They 
estimate  this  by  applying  the  male  -averages  to 
female  earnings  function  and  find  only  about  12 
percent  of  the  gap  accounted  for  by  experience 
and  interruption.  If  one  goes  the  other  way  and 
puts  the  female  averages  into  the  male  earnings' 
model,  34  percent  is  accounted  for  ($.87).  I'm 
not  sure  I  know  which  way  is  right — but  they 
don't  give  the  same  answer.  Still,  two-thirds 
of  the  difference  is  left  unaccounted  for  and 
only  a  small  part  of  that  can  be  attributed  to 
the  included  education  variables.  The  explanation 
for  the  difference  lies  in  the  different  experi- 
ence pay-off  functions  estimated  for  men  and 
women.  Especially  for  women  with  no  interruption 
in  work,  the  pay-off  for  experience  is  extremely 
small . 

I  would  like  to  comment  more  generally  on  the 
potential  use  of  SIPP  for  this  kind  of  study. 
This  particular  study  does  not  take  advantage  of 
the  longitudinal  features  1n  the  ISDP,  and  I 
feel  strongly  that  future  work  of  this  kind 
should  exploit  the  power  of  these  data  more 
fully. 

One  of  the  hypotheses  related  to  the  inter- 
ruption of  work  phenomenon  has  to  do  with  rate 
of  recovery  of  "depreciated"  earning  rates. 
ISDP  and  SIPP  both  provide  several  readings, 
over  time,  on  earnings  rates,  and  hypotheses 
about  rates  of  change  can  be  confronted  directly. 
A  longitudinal  data  base  with  historical  recall 
questions  can  also  do  much  more  than  indicate 
current  marital  status.  Duration  and  cumulative 
effect  of  statuses  like  marriage  and  recency  of 
change  can  be  brought  to  bear  on  more  highly 
articulated  models  for  testing  implications  of 
one  or  another  theories  about  the  earnings/ 
experience  relationship. 

It  is  unfortunate  for  these  particular  issues 
that  the  SIPP  measure  of  the  length  of  inter- 
ruptions remains  somewhat  crude  even  in  the 
latest  questionnaire.  An  Interruption  "from" 
1962  "to"  1962  can  be  anything  from  6  to  12 
months  1n  duration;  from  1962  to  1963,  the  range 
1s  6  to  24  months;  from  1962  to  1964,  it  is  14 
to  36  months.  Such  Imprecision  dilutes  whatever 
effect  this  variable  may  have.  It  would  seem 
that  a  question  aimed  at  getting  a  start  date 
and  duration  or  an  end-point  and  duration  would 
be  equally  feasible  and  provide  a  more  definite 
Interval  of  time. 
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This  section  is  comprised  of  five  papers  presented 
in  this  session  which  was  sponsored  by  the  Section 
on  Survey  Research  Methods. 


OBTAINING  CROSS-SECTIONAL  ESTIMATES  FROM  A  LONGITUDINAL  SURVEY: 

EXPERIENCES  OF  THE  INCOME  SURVEY  DEVELOPMENT  PROGRAM 

Hertz  Huang,  Bureau  of  the  Census 


I.   INTRODUCTION 

In  1975  the  Secretary  of  the  Department  of 
Health,  Education  and  Welfare  (The  Department  of 
Health  and  Human  Services  (HHS)  predecessor 
agency)  authorized  a  program,  the  Income  Survey 
Development  Program  (ISDP),  to  resolve  technical 
and  operational  issues  for  a  major  new  survey 

—  the  Survey  of  Income  and  Program  Participation 
(SIPP).  Much  of  the  work  of  the  ISDP  centered 
around  four  experimental  field  tests  that  were 
conducted  1n  collaboration  with  the  Bureau  of  the 
Census  to  examine  different  concepts,  procedures, 
questionnaires,  recall  periods,  etc.  Two  of  the 
tests  were  restricted  to  a  small  number  of 
geographic  sites;  the  other  two  were  nationwide. 
Of  the  two  nationwide  tests,  the  more  Important 
data  collection  was  called  the  1979  Research 
Panel.  This  panel  consisted  of  nationally 
representative  samples  which  provided  a  vehicle 
for  feasibility  tests  and  controlled  experiments 
of  alternative  design  features.  Information 
concerning  the  ISDP  may  be  found  1n  Yeas  and 
L1n1nger  (1981),  David  (1983),  and  the  survey 
documentation  now  available  through  the  National 
Technical  Information  Service  (1983). 

The  1979  Research  Panel  was  a  multiple  frame 
sample  consisting  of  a  general  population  (area) 
sample  of  9300  Initially  designated  addresses 
drawn  from  the  1976  Survey  of  Income  and  Education 
(SIE)  and  some  Census  Bureau's  current  survey 
reserve  measures  and  new  construction  update,  and 
two  11st  frame  samples  of  (a)  eligible  applicants 
for  the  Basic  Educational  Opportunity  Grant  (BEOG) 
Program  and  (b)  blind  and  disabled  Supplemental 
Security  Income  (SSI)  recipients. 

The  1979  Research  Panel  was  a  longitudinal 
survey  consisting  of  six  waves  of  Interviewing. 
The  sample  was  divided  Into  three  interviewing 
panels.  The  first  panel  was  first  Interviewed  1n 
February  1979;  the  second  panel  war,  first  Inter- 
viewed in  March;  and  the  third  panel  was  first 
Interviewed  1n  April.  Each  sample  unit  was 
subsequently  Interviewed  every  three  months.  A 
sample  of  addresses  was  chosen  and  persons  living 
1n  the  sample  units  (addresses)  during  the  first 
wave  of  Interviews  were  defined  as  original  sample 
persons.  For  Interviews  subsequent  to  the  first, 
the  sample  of  addresses  became  a  sample  of  per- 
sons; accordingly,  original  sample  people  were 
followed  to  their  new  addresses  1n  subsequent 
Interviews  (with  reasonable  geographic  constraints 

—  within  50  miles  of  any  ISDP  Primary  Sampling 
Unit).  Personal  Interviews  were  conducted  1n  Wave 
1  with  all  adults  (persons  sixteen  years  old  and 
over)  at  the  sampled  address.  These  become  the 
original  sample  persons.  During  Waves  2--6  all 
persons  currently  residing  with  an  origins'!  sample 
person  were  Interviewed.  This  means,  for  example, 
that  1f  an  original  sample  person  moved  to  a  new 
address  with  four  other  adults,  then  question- 
naires were  atftvr1s;1sterv:c!  to  everyone  at  the 
original  sample  person's  new  address.  If  any 
original  sample  person  remained  at  the  first  wave 
address,  anyone  who  moved  into  that  address  with 


the  original  sample  person  was  also  Interviewed. 
Thus,  Interviews  were  conducted  with  all  adults  at 
an  address  as  long  as  at  least  one  of  the  adults 
present  was  an  original  sample  person.  Because  of 
the  ISDP  rules,  persons  can  be  lost  from  sample 
because  they  move  beyond  the  survey's  boundaries; 
1n  addition,  people  were  added  to  the  sample 
because  they  became  part  of  the  housing  unit  1n 
which  the  original  sample  person  resides. 

Obviously,  the  universe  changes  continuously 
through  the  life  of  the  survey.  A  great  deal  of 
interest  exists,  however,  in  developing  cross- 
sectional  estimates  at  the  time  of  each  Interview 
wave.  In  the  absence  of  drawing  a  new  sample  at 
each  interview,  any  cross-sectional  estimates 
developed  for  Waves  2-6  are  subjeef'to  a  popula- 
tion coverage  bias.  This  paper  will  focus  only  on 
the  covered  population  and  present  some  unbiased 
base  weights  for  cross-sectional  estimators  for 
the  non-1nst1  tutionallzed  U.S.  population  repre- 
sented by  the  longitudinal  sample  (the  population 
coverage  bias  will  remain,  however,  since  units 
containing  no  persons  who  were  1n  the  universe  at 
the  time  of  Wave  1  cannot  come  Into  sample).  Since 
the  methodology  for  treating  both  area  sample  and 
11st  frame  samples  was  needed  for  ISDP  1979 
Research  Panel,  both  will  be  described  below.  The 
estimation  methods  described  here  are  directly 
applicable  to  the  Survey  of  Income  and  Program 
Participation  (SIPP),  an  overall  description  of 
which  1s  found  1n  Nelson,  McMillen,  and  Kasprzyk 
(1984)  and  Herri ot  and  Kasprzyk  (1984). 
II.  THE  POPULATION  FOR  CROSS- SECTIONAL  ESTIMATES 
We  begin  by  defining  the  general  population  for 
which  estimates  are  required.  All  households 
existing  during  the  first  wave  of  interviews 
(February  through  April  1979)  are  considered  the 
initial  population.  Based  on  the  rules  adopted 
for  the  following  Individuals  who  move,  we  have 
essentially  a  longitudinal  sample  of  persons  as 
well  as  households  for  the  Initial  population. 
Since  no  new  sample  was  drawn  at  any  subsequent 
interview,  the  sample  does  not  completely  repre- 
sent the  non-Institutionalized  U.S.  population 
after  first  quarter  of  interview.  There  were 
persons  1n  the  following  categories  at  the  Initial 
interview  time  but  became  part  of  the  non-institu- 
tional population  at  a  subsequent  wave  of  inter- 
viewing: 1)  U.S.  citizens  living  abroad,  2} 
citizens  of  other  countries  who  subsequently  move 
to  the  U.S.,  3)  persons  1n  institutions  or  armed 
forces  barracks.  These  persons  will  be  called  the 
group  R  subpopulation  which  did  not  have  chance  to 
be  selected  as  original  sample  persons.  At  a 
subsequent,  wave  of  interviews,  the  longitudinal 
sampie  did  not  Include  any  household  in  which  all 
current  members  were  1n  the  groyp  R  subpopul*Uon. 
However,  persons  1n  the  group  R  subpopulation  who 
later  Joined  households  that  included  original 
persons  eligible  for  sampling  1n  the  first  wave 
were  added  to  the  cross-section?.''  ».Tfiv&r$c.  these 
persons  along  with  newborns  will  be  called 
"additions"  in  subsequent  waves.  In  general, 
"additions"  are  defined  as  persons  moving  into 


eligible  households  after  the  first  wave  who  were 

not  eligible  for  sampling  in  the  first  wave. 

III.  GENERAL  CONCEPT  Of  CROSS-SECTIONAL  ESTIMATION 

Due  to  the  procedures  adopted  for  following 
movers  1n  the  1979  Research  Panel,  at  subsequent 
Interviews  a  household  could  consist  of  members 
from  more  than  one  household  1n  the  universe  at 
the  time  of  the  first  wave.  The  Inclusion 
probability  of  such  a  household  would  depend  on 
the  inclusion  probabilities  of  the  households 
which  the  members  of  the  current  household  were 
part  of  at  the  time  of  the  first  interview.  The 
Inverse  of  the  Inclusion  probability  1s  usually 
used  as  the  weight  of  a  sample  household  in 
estimation.  However,  because  of  the  sample  design 
of  the  1979  Research  Panel,  the  Inclusion  proba- 
bility of  a  household  is  a  function  of  Us  primary 
sampling  unit,  type  of  sampling  frames  and  the 
1975  Income  of  the  household  which  occupied  the 
housing  unit  during  the  SIE  Interview.  Only  the 
inclusion  probability  of  an  original  sample 
household  was  feasible  to  calculate.  The  inclusion 
probability  of  an  original  nonsample  household  1s 
almost  Impossible  to  evaluate,  but  such  households 
can  come  Into  sample  on  later  waves.  Therefore, 
some  alternative  weighting  procedures  needed  to  be 
explored. 

The  Idea  to  be  presented  1n  this  discussion  1s 
very  simple.  We  will  associate  observations  at 
any  given  point  1n  time  with  the  known  Inclusion 
probabilities  of  the  original  sample  households. 
We  will  split  up  observations  belonging  to  a 
household  when  current  household  members  come  from 
more  than  one  original  household.  A  portion  of 
the  observation  1s  then  associated  with  each 
original  household.  The  fol lowing  example  will 
Illustrate  the  Idea:  Assume  that  A  &  B  are  two 
original  households  with  Inclusion  probabilities 
*a  and  *b  respectively.  At  the  first  wave  of 
Interviews,  household  A  consists  of  five  members, 
a,b,c,d,  and  e,  and  the  household  B  consists  of 
three  members,  f,g,  and  h.  During  the  second  wave 
of  Interviews  we  find  that  d,e,  and  f  are  living 
together  and  form  a  new  household,  called  house- 
hold C,  while  a,b,  and  c  are  still  1n  household  A 
and  g  and  h  are  still  1n  household  B. 


Household  A 


Household  B 


Two  alternatives  are  proposed,  both  Involving  the 
division  of  household  C  Into  two  parts;  one  part 
1s  associated  with  household  A  and  the  other  with 
household  B. 
a)  Multiplicity  Approar.hr 

Bdifid  on  the  rtumbei  of  ways  (tailed 
multiplicity)  that  the  new  household  C  can  be 
included  1n  the  sample,  the  observation 


(additive,  such  as  counts,  income  or  values) 
of  household  C  (called  Xr,)  1s  divided  by  the 
number  of  original  households  Involved  (two  1n 
this  case)  and  each  portion  1s  added  to  the 
corresponding  observation  of  household  A 
(called  Xa)  and  household  B  (called  Xe). 
Therefore,  1f  both  households  A  and  B  are 
original  sample  households,  the  cross- 
sectional  estimate,  x,  for  the  total  at  the 
second  wave  based  on  these  three  households 
can  be  expressed  as: 


x  -  —  <xA 


I 


1 


1  , 


Hence,  the  weight  for  the  new  household  c  1s 
1 +  1_.   If  only  household  A  1s  an  original 

2*A   2*b 

sample  household,  then  the  weight  for  the  new 
household  1s  1_;   1f  only  household  B  1s  an 

2»A 
original  sample  household  then  the  weight  for 
the  new  household  is  1 . 

2*B 
Fair  Share  Approach 

This  approach  assumes  that  all  household 
members  contribute  equally  to  their  household. 
Thus,  the  observation  of  household  C  1s 
divided  into  appropriate  portions  based  on  the 
proportion  of  members  of  household  C  which 
come  from  each  original  household  (2/3  from 
household  A  and  1/3  from  household  B  1n  this 
example).  Therefore,  1f  both  households  A  and 
B  are  original  sample  households,  the  cross- 
sectional  estimate  for  the  total  at  the  second 
wave  based  on  these  three  households  1s 
expressed  as 


l 


<s. 
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Hence,  the  weight  for  the  new  household  c  1s 

2 +  1_.   If  only  household  A  1s  an  original 

3*A   3»B 

sample  household,  then  the  weight  for  the  new 

household  is  2_;  1f  only  household  B  1s  an 

3*A 
original  sample  household  then  the  weight  for 
the  new  household  1s  1 . 

3»B 

Since  our  sample  was  longitudinal  1n 
nature,  1f  the  universe  remained  constant 
through  time,  original  sample  persons  would  be 
&  representative  sample  of  the  universe  at  any 
given  point  1n  time.  Hence,  using  the 
inclusion  probabilities  of  the  original  sample 
persons,  the  above  estimators  are  unbiased 
(proof  is  given  In  next  section).  However,,  our 
feasible  target  population  (excluding  the 
group  R  subpopulatlon)  changes  through  time  by 
Inc'ludlnq  th*!  'additions'  (defined  1n  Section 
II).  To  compensate  for  this,  we  will  include 
these  "additions"  1n  the  proposed  estimators 
below. 


IV.   PROPOSED  ESTIMATORS  FOR  GENERAL  POPULATION 
(AREA)  FRAME 

Before  the  estimators  are  given,  some  notation 
should  be  defined.  Note  that  some  of  the  defined 
quantities  may  not  be  observed.  For  the  first 
wave  of  interview  (time  t0),  let 

N(t  J 
X(t  )  =  I  Xk^to^  the  Parameter  t0  be  estlmat- 
k=1  ed,  where  Xk  (t0)  1s  the 
value  of  the  characteristic 
for  the  ktn  unit  (which  may 
be  a  person  or  a  household) 
and  N(t0)  1s  the  number  of 
units  1n  the  universe  at  time 

t0; 

ok  -  1  1f  unit  k  was  in  the  sample  at  time  t0,  k  = 
1,  ..-,  N(t0) 
=  0  otherwise 

*k  =  the  probability  that  unit  k  was  selected 

for  the  sample  at  the  first  wave  of  Interview 
(time  t0) 
=  Pr  (ak  =  1)  =  E(«k),  k=l,...,  N(t0)  At  a  sub- 
sequent wave  (time  t),  define  for  each  house- 
hold 1: 

S-j  =  the  total  number  of  current  residents  of 
household  1  at  time  t 

r^  =  the  number  of  original  eligible  households 
from  which  the  current  residents  come  (does 
not  Include  households  from  which  "additions" 
come) 

and 

Sll»  $12'  •••'  $1ri  be  t(ie  number  °^  current 

residents  1n  household  1   from  each  of  the  r^ 

original  households  and  Sia  be  the  number  of 

current  residents  from  the  "additions"  as  defined 

1n  Section  II.  Write 


;i 


r1 


S,     ■    t   SlJ  ♦  S1a     and       S1fl  =    T   SfJ 

Also  define  N(t)  as  the  total  number  of  units 
1n  the  target  population  at  time  t,  such  as  house- 
hold units  (Include  all  the  original  households 
plus  newly  formed  households)  or  other  units  based 
on  a  group  of  persons  such  as  families  or 
sub-fam1l1es  (Include  all  sample  persons  Inter- 
viewed nonsample  persons  plus  "additions").     And 

N(t) 
let   X(t)   *  J       X   (t)  be  the  parameter  (total)  to 

1=1       1 
be  estimated  for  the  target  population  at  time  t. 
To   simplify   the  notation,    we  will   replace  N(t), 
X(t)  and  Xi (t)  by  N,  X  and  Xi  respectively. 

Based  on  the  general  concept  described  in 
Section  III,  two  cross-sectional  estimators  are 
proposed  for  the  area  frame  to  estimate  the  total 
of  a  characteristic  of  the  target  population  at 
time  t. 
Estimator  I   (Multiplicity  Estimator): 

This  estimator  is  based  on  the  multiplicity  of 
each  current  household 


N 


m  At 


Note  that  om  and  *j  are  associated  with  original 
households  But  are  reindexed  within  each  current 
household  1.  It  is  easily  seen  that 


E(*m>-E(I  »M1X1>=£ 


\1=1  J=l  Vj  1  / 


N   r  E(ttl) 

1=1  j=i  ri"j 


1=1  ri  Vj=i  "j  / 


=  1  V 

1=1  1 


Therefore  xM  1s  an  unbiased  estimator  of  X.  Note 
also  that  1f  aj  =  0  1t  is  not  necessary  to  know 
n j ,  so  that  Wmi  can  be  calculated  based  on  the 
selection  probability  only  for  sample  units. 
Estimator  II  (Fair  Share  Estimator): 

This  estimator  is  motivated  by  the  assumption 
that  all  current  household  members  contribute 
equally  to  the  household  in  which  they  reside  for 
the  major  household  characteristic  values,  such  as 
earnings  and  welfare  benefits. 
N 
XF  =  ,?,  WF1Xi 


As  1n  the  multiplicity  estimator,  aj  and  *j  are 
associated  with  original  households  but  are 
reindexed  within  each  current  household  1.  One 
can  see  that  xF  1s  also  an  unbiased  estimator  of 
X  as  follows: 


E(X,)  =  E 


=lb1o\j=l   "j   / 


=  I  X  =  X 


Note  that  if  household  j  was  not  1n  sample  at 
time  t0»  1t  1s  unnecessary  to  know  the  number  of 
current  residents  from  original  household  j,  S-m, 
1n  xF  since  aj  =  0.  Also  note  that  because 
"additions"  are  not  Included  in  the  weight 
calculations,  they  must  be  Identified  and  excluded 
from  using  either  estimator. 
Comparison  of  Estlmatorl  and  Estimator  II 

Both  Estimator  I,  xM,  and  Estimator  II,  xF, 
are  feasible  to  compute.  We  now  compare  them  with 
respect,  to  both  operational  convenience  and 
reliability. 

In  order  to  compute  xM,  the  number  of  original 
households  eligible  for  sampl 1ng  from  which  the 
cwrreni  residents  cam  is  needed.  This  Information 
1s  particularly  difficult  to  obtain  at  each 
successive  wave  of  the  survey.  However,  to  compute 
9,  onft  only  needs  to  know  the  number  of  current 
residents  1n  the  household  (excluding  new  addi- 
tions) and  the  number  of  residents  from  each 
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original  sample  household.  This  information  could 
be  obtained  from  the  1979  Research  Panel  person 
identifier  without  collecting  additional  Informa- 
tion. 

The  equal  contribution  from  the  members  of  a 
household  1s  a  natural  assumption.  It  reflects 
better  the  actual  share  among  the  household  members 
1n  the  absence  of  knowledge  of  the  actual  contri- 
bution from  each  member.  For  example,  without 
knowledge  of  each  person's  age,  employment  status 
and  other  needed  Information,  1t  is  more  logical 
to  assume  that  earnings  and  welfare  benefits  are 
equally  contributed  by  household  members  than  any 
arbitrary  way  of  defining  household  members' 
shares.  And  as  will  be  seen  below,  that  heuristic- 
ally  x  can  be  justified  as  the  approximate 
minimum  variance  unbiased  estimator  under  what 
seems  to  be  natural  assumptions  given  a  state  of 
Ignorance  about  the  actual  shares  of  the  household 
members . 

Assume  that  at  a  subsequent  wave  time  t  three 
households  are  generated  from  two  original  house- 
holds of  the  first  wave  of  interview  (time  t  )  as 
follows:  ° 


1 


^v— -S  /--> — A. 


00- 


time  t  and  X,,  j=l,  ...,  N  be  that  value  for 
household  j  a\  time  t.  Using  Section  III  we 
divide  up  X,  1n  two  parts,  fX.  and  (l-f)X.  and 
then  associate  fX3  with  X.  andJ(l-f)  X3  wltrt  X-. 
Without  loss  of  generality,  assume  that  a  sampie 
size  of  1  was  selected  at  the  first  wave,  t  ,  with 
probability  ».,  k=l,  ...,  N(t  ).  An  unbiased 
estimator  of  total,  X,  at  time  t  can  be  written 
as 

a      a2        fBl   (l-f)«2 


where  a.,  1=1,  N(t  )1s  defined  at  the  beginning  of 
this  section.  Nc°t1ce  that  both  x  and  x  are 
special  cases  of  x.  The  variance  of  x  1s 

Var(x,  .  .,  {(i-  x,  ♦  t-  x,)  -  xj' 

-iKSjv  J?  »,)-"}'♦ 

The  remaining  terms  are  not  explicitly  given  here 
since  they  are  not  functions  of  f.  The  Var(x) 
1s  minimized  1f 


I"2  V  V1 


Since  usually  not  both  *\  and  *2  are  known  and  1n 
most  of  the  surveys  conducted  by  the  8ureau  of  the 
Census,  the  Inclusion  probabilities,  n|<,  are  about 
the  same  for  all  ultimate  sampling  units  (even 
though  they  are  unequal  1n  the  1979  ISDP),  one  may 
simplify  f  to  f  =  (X2  +  X3  -X1)/2X3 

Obviously,  a  weight  defined  as  a  function  of 
survey  observations  1s  not  easy  to  Implement.  To 
further  simplify  f,  we  assume  the  percentage 
growth  of  X  from  t0  to  t  1s  constant  for  all  units 
Involved  and  define 


A31 


a  X  (t  )  -  X. 

a  X2(tQ)  -  X,  +  X32 

X3  "  X31  +  X32 

where  X3^  is  the  share  of  X3  belonging  to  house- 
hold members  from  original  household  1,  1=1,2. 

Without  Knowledge  of  both  X^(t0)  and  X2(t0),  one 
might  naturally  assume  that  the  two  Initial 
households  are  about  the  same  I.e.,  Xj ( t0)  =  X2 
(t0)  and  reduce  f  to  X31/(X3i  +  X32). 

Now  1f  the  contribution  to  X3  1s  proportional 
to  the  number  of  persons  from  each  original  house- 
hold, then  f  =  S3i/S3o,  as  defined  1n  Wp^ .  This 
result  can  be  extended  to  any  sample  size  as  well 
as  to  the  case  that  the  new  household  members  are 
from  more  than  two  original  households.  There- 
fore, without  knowledge  of  the  actual  contribu- 
tion from  each  household  member,  Var(Xp)  1s 
smaller  than  Var(xM)  under  these  assumptions. 
V.  PROPOSED  ESTIMATORS  FOR  LIST  FRAMES 

Since  persons  are  the  11st  frame  sampling 
units,  we  can  divide  all  persons  1n  the  general 
population  into  three  groups  based  on  their 
relationship  with  the  list  frame  under  considera- 
tion. 

I)   Persons  who  are   included  1n  the  11st  frame 
(called  11st  frame  persons).  For  the  SSI 
11st  frame,  this  group  Includes  all  the 
(under  65)  recipients  of  the  Federal 
Supplemental  Security  Income  1n  December 
1978;  while  for  the  BEOG  11st  frame,  this 
group  Includes  all  the  eligible  applicants 
of  the  Basic  Educational  Opportunity  Grant 
as  of  September  1978  for  school  year 
1978-79. 
II)   Persons  who  are  not  included  1n  the  list 
frame  but  live  with  a  list  frame  person(s) 
during  the  first  wave  of  Interview  (February 
through  April  1979). 
Ill)  Persons  who  are  not  Included  1n  the  11st 
frame  nor  do  they  live  with  a  list  frame 
person(s)  during  the  first  wave  of  Inter- 
view. 
Both  Group  I  and  II  had  some  chance  to  be 
included  1n  the  11st  frame  sample,  but  Group  III 
did  not.  The  original  (first  quarter)  households 
which  consist  of  Group  I  and/or  Group  II  persons 
will  be  called  11st  frame  households.  As  time 
went  on,  some  members  of  Group  III  moved  in  and 
lived  with  person(s)  belonging  to  Group  I  or  II, 
Such  members  of  Group  HI  will  be  'additions'  for 
the  11st  frame,  since  they  are    not  Initially 
eligible  for  sampling  1n  the  11st  frame.  Note 
that  the  type  of  persons  already  described  as 
"additions"  for  the  general  population  (as  defined 
1n  Section  II)  will  also  be  "additions"  for  the 
11st  frame.  For  the  following  discussions,  we  now 
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define  two  types  of  "additions"  for  the  11st 
frames:  the  "additions"  that  come  from  Group  III 
will  be  called  "Group  III  addit1ons"and  the  type 
of  "additions"  as  defined  for  the  area  frame  will 
be  called  "area  frame  addition." 

If  a  11st  of  recipients  of  a  government 
assistance  program  1s  used  as  a  11st  frame  then 
Group  III  1s  usually  fairly  large.  If  we  con- 
struct our  estimators  the  same  way  as  we  did  for 
the  area  frame,  we  will  Include  many  of  Group 
III  persons  in  our  estimates  at  time  t  of  subse- 
quent Interviews.  Consequently,  we  wouldn't 
really  know  what  "subpopulatlon"  we  were  estimat- 
ing. In  our  opinion,  1t  1s  not  feasible  to  define 
such  a  subpopulatlon  at  time  t.  Without  new 
sample  drawn  each  wave  from  the  updated  11st,  a 
proper  cross-sectional  estimate  for  a  11st  frame 
subpopulatlon  at  time  t  1s  not  likely,  especially 
If  the  turnover  rate  of  the  11st  frame  members  1s 
high.  Therefore,  we  will  restrict  our  cross- 
sectional  estimates  to  be  based  on  only  the 
original  11st  frame  sample  persons  (that  1s,  the 
list  frame  persons  selected  for  list  frame  sample 
plus  all  the  persons  who  reside  with  them  during 
the  first  quarter  of  Interview)  and  the  "area 
frame  additions."  In  so  doing  we  know  that  at  any 
time  t,  the  target  population  we  are  estimating 
consists  of  the  original  11st  frame  subpopulatlon 
(that  1s  Groups  I  and  II)  and  the  type  of  "addi- 
tions" as  defined  1n  the  area  frame.  Note  that 
the  original  11st  frame  subpopulation  1s  deter- 
mined by  persons  who  were  on  the  11st  at  the  time 
of  sample  selection.  They  may  not  be  on  the  11st 
by  the  time  of  Initial  Interview. 

In  the  1979  1SDP  panel,  a  household  may  have  a 
multiple  chance  of  being  selected  for  the  11st 
frame  sample  1f  more  than  one  member  of  the  11st 
frame  persons  live  1n  that  household  at  the  first 
wave  of  Interview.  (Some  effort  was  made  to 
reduce  multiple  chance  of  selection  for  those 
households  1n  SSI  frame.)  Therefore,  the  concept 
of  the  base  weight  for  the  first  wave  of  Interview 
1s  no  longer  trivial. 

NL(t0) 
Similar  to  the  area  frame,  we  define  X(t  )  =  £ 

X-t  (t0)  as  the  population  parameter  to  be  estimat- 
ed from  a  11st  frame  sample  at  time  t0,  where 
X-|(t0)  1s  the  value  of  the  characteristic  for  the 

1th  unit  1n  the  11st  frame  subpopulatlon,  which 
Includes  both  Group  I  and  II  defined  at  the 
beginning  of  this  section.  Let 
<*i  =  1  1f  11st  frame  person  1  1s  1n  the  sample, 
=  0  otherwise  (note  that  a\   =  0  for  all 
non-list  frame  persons) 
i-t  =  the  probability  that  11st  frame  person  1 
1s  selected  for  the  11st  frame  sample  for 
the  first  wave  of  Interview  (time  t0) 
=  Pr  (cm  =  1)  =  E(tt1) 
90j  -  the  number  of  11st  frame  persons  (Indexed 
by  1)  1rt  the  Jth  household  at  time  t0. 

Then  the  base  weight  at  time  t0  for  the  Jtn 
household  and  its  residents  1s  defined  as 


=  the  total  number  of  11st  frame 
persons  living  1n  the  original 
(time  t0)  list  frame  households 
which  the  current  residents  of 
the  ktn  household  come  from. 

=  the  total  number  of  current 
residents  at  time  t;  S|<i,S|<2, . . . , 
Sfcr.  be  the  number  of  current 

residents  in  the  ktn  household 
who  come  from  each  of  rj<  original 
list  frame  households;  S|<d  1s  the 
number  of  current  residents  of 
the  ktn  household  who  are  from 
the  "area  frame  additions";  and 
S|<  iii  1s  the  number  of  current 

residents  of  the  ktn  household 
who  are  from  the  "Group  III 
additions."  Therefore 


sk  =  r  ■ 

=  Skc- 


+  s, 


°ka. 


NL  =  the  total  number  of  units 
such  as  household  or  family 
units,  in  the  list  frame 
universe  at  time  t 
The  two  cross-sectional  estimators  for  the  total 
of  a  characteristic  of  the  list  frame  target 
population  at  time  t  are   as  follows: 
Estimator  I  (Multiplicity  Estimator) 

To  avoid  estimating  "Group  III  additions"  we 
will  treat  all  the  current  residents  from  the 
"Group  III  additions"  as  a  separate  list  frame 
sampling  unit.  Therefore,  1n  the  ktn  household  at 
time  t,  there  are  B|<  +  U^  11st  frame  sampling 
units,  where  U|<  =  1,  if  some  of  the  current 
residents  1n  the  ktn  household  are  from  "Group  III 
additions,"  0  otherwise.  The  multiplicity 
estimator  for  the  list  frame  population  total  1s 
given  1n  the  following: 


YL  = 


NL 


*H  =  1 


"Mk 


1=1 


(Bk  +  U.  ), 


<*1    and   n\    are  associated  with  original    11st  frame 
person    but    are    reindexed    within    each    current 
household  k. 
Estimator  II    (Fair  Share  Estimator) 

Motivated  by  the  assumption  that  all  current 
residents  contribute  equally  to  a  household  we 
propose  the  following  list  frame  estimator: 


I     WF, 


where  a^    and   »■)    are  associated  with    11st  frame 
persons  but  are  reindexed  within  household  J. 
For  time  t  of  a  subsequent  wave,   let 


and  Sfcj  and  W0-j  are  associated  with  original 
household  but  are  reindexed  within  each  current 
household  k. 


95 


These  two  estimators  are  constructed  to 
estimate  the  list  frame  subpopulation  excluding 
the  "Group  III  addition."  They  Are  not  unbiased 
estimators  in  the  global  sense,  Independent  of  the 
value  of  the  characterise  of  Interest.  However, 
the  fair  share  estimator  would  be  unbiased  under 
the  assumptions  that  all  current  residents 
contribute  equally  to  a  household  and  a  household 
is  treated  as  a  fraction  of  a  household  1f  some  of 
the  current  residents  are  from  "Group  III  addi- 
tions." 

Tn  addition  to  the  "unblasedness"  described 
above.  x{;     is     also     preferred     for     the     same 

reasons    (operational    and   reliability)   stated  1n 
the  area  frame.      In  computing  x{;,  we  need  to 


know 


B°l 


,  the  number  of  list  frame  persons  in  < 


sample  household  at  the  Initial  interview  (time 
t0).  This  Information  was  not  difficult  to  obtain. 
And  at  any  subsequent  wave  of  interview  time  t,  we 
needed  to  know  only  S|<c,  the  total  number  of 
current  residents  who  are  not  "area  frame  addi- 
tions" and  Skj,  the  number  of  current  residents 
from  each  original  list  frame  sample  household. 
The  latter  can  be  obtained  through  the  person 
Identifier. 

However,  1n  order  to  compute  xjj  at  time  t  we 
would  have  to  ask  a  complicated  question  to  obtain 
Bk,  the  total  number  of  11st  frame  persons  living 
in  the  original  households  from  which  the  current 
residents  come. 
VI.  SUMMARY 

These  two  estimators  were  constructed  based  on 
the  specific  procedure  of  following  movers  1n  the 
1979  ISOP.  However,  they  can  be  easily  modified 
to  apply  to  other  designs  and  procedures.  The 
fair  share  estimator  was  actually  used  for  the 
1979  ISDP.  It  1s  also  being  used  for  the  1984 
Survey  of  Income  and  Program  Participation. 

As  noted  in  Section  III,  the  Inverse  of  the 
Inclusion  probability  of  a  household  at  time  t  is 
usually  used  as  the  weight  of  that  household  to 
obtain  an  unbiased  estimator.  When  a  household 
consists  of  members  from  two  original  households 
(called  households  1  and  j),  the  Inclusion 
probability  of  this  new  household  is  »i  +  nj  -*ij, 
where  ir -jj  1s  the  joint  selection  probability  of 
households  1  and  J  at  the  first  wave  of  Interview. 
This  Inclusion  probability  1s  operationally 
Impossible  to  obtain,  but  its  inverse  can  be 
reduced  to  the  weight  (W^-))  of  the  multiplicity 
estimator  1n  most  surveys  conducted  by  the  Census 
Bureau.  In  these  surveys,  the  wave  1  Inclusion 
probabilities  are  almost  the  same  for  all  ultimate 
sampling  units  and  the  joint  selection  probabili- 
ties are  generally  due  to  the  sampling  without 
replacement  within  PSUs  which  are  generally 
negligible.  Therefore,  the  fair  share  estimator 
not  only  overcomes  the  trouble  of  obtaining  such 
Inclusion  probabilities,  but  1t  has  good  variance 
properties  under  some  reasonable  conditions  and  1t 
1s  easy  to  Implement. 

As  described  1n  Section  V,  the  application  of 
this  approach  to  multiple  frame  longitudinal 
surveys  presents  additional  problems,  and  the 
resulting  estimators  are  not  nearly  as  satisfac- 
tory. 


This  research  was  completed  berore  the  first 
Interviews  of  the  1979  ISOP  Research  Panel. 
Horvltz  and  Folsom  (1980)  have  done  similar  work 
1n  conjunction  with  the  National  Medical  Care 
Utilization  and  Expenditure  Survey. 
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I.     INTRODUCTION 

Since  October  of  1963,  the  Census  Bureau  has 
been  conducting  Interviews  for  a  new  survey, 
the  Survey  of  Income  and  Program  Participation 
(SIPP).  The  survey  will  effect  long-sought 
improvements  1n  the  measurement  of  annual 
Income  and  the  complex  relationships  between 
income  flows,  labor  force  participation, 
participation  1n  government  programs  such  as 
welfare,  and  tax  policy.  One  of  the  products 
of  the  Interviewing  will  be  a  set  of  longitudi- 
nal records  on  a  probability  sample  of  the 
population.  The  subject  we  address  1n  this 
paper  1s  the  weighting  of  these  longitudinal 
records  so  that  the  data  may  be  analyzed. 

We  are  aware  of  only  two  precedents  for  this 
weighting.  They  are  the  National  Medical  Care 
Expenditure  Survey  (NMCES)  and  the  National 
Medical  Care  Utilization  and  Expenditure  Survey 
(NMCUES).  The  latter  was  conducted  Jointly  by 
the  Research  Triangle  Institute  and  the 
National  Opinion  Research  Center[21.  Some  work 
was  done  on  the  problem  for  the  Income  Survey 
Development  Program  (IS0P)[6],  but  1t  was  not 
Implemented.  The  techniques  used  by  them  are 
among  those  under  consideration  for  SIPP. 
Naturally  though,  we  are  also  considering  some 
new  Ideas.  These  Ideas  are  still  1n  a  very 
preliminary  form.  We  are  presenting  them  here 
to  get  early  reaction  and  suggestions  from  the 
statistical  community. 

Our  general  approach  consists  of  three  major 
steps.  The  first  step  1s  to  derive  an  unbiased 
weight  for  each  longitudinal  record.  This  1s 
not  as  straightforward  as  1t  seems  due  to  the 
fact  that  a  slightly  different  set  of  people  1s 
being  Interviewed  each  month.  Section  III 
discusses  this  step. 

The  second  step  1s  to  make  adjustments  for 
those  records  that  are  Incomplete.  We  will  use 
Imputation  when  part  of  an  Interview  1s 
missing.  (See  Samuhel's  paper  1n  this  session 
[3].)  We  will  also  probably  use  Imputation 
when  a  whole  Interview  1s  missing  where  the 
missing  Interview  1s  bracketed  by  good  Inter- 
views. Our  research  on  adjusting  for  records 
with  more  than  one  missing  Interview  1s  1n  too 
preliminary  a  stage  to  report  on.  (One 
proposal  has  been  made  by  Little  and  Dav1d[4].) 

The  third  step  1s  to  correct  for  dlspropor- 
tlonal  representation  of  demographic  types  to 
reduce  variance  and  gain  some  consistency  with 
the  Current  Population  Survey  (CPS).  Section 
IV  discusses  this  step. 

Before  discussing  the  weighting,  1t  1s 
essential  that  we  define  which  of  the  many 
possible  longitudinal  universes  1s  the  universe 
for  which  estimates  are  to  be  provided. 
Section  II  deals  with  this  problem. 

Finally,  we  mention  some  of  the  Important 
features  of  the  design  of  SIPP.  For  more 
details,  the  reader  1s  encouraged  to  first  read 
an  overview  of  the  survey  [5J.  Roughly  20,000 
households  were  Interviewed  between  October 
1983  and  January  1984,  Inclusively.     That  set 


of  interviews  constitutes  the  first  wave  of  the 
1984  panel  of  SIPP.  The  Census  Bureau  will  try 
to  Interview  the  persons  1n  those  households  an 
additional  seven  or  eight  times  1n  four-month 
waves,  even  1f  they  move.  We  will  also 
Interview  any  persons  who  "usually  reside" 
with  anyone  1n  the  original  cross-section  for 
at  least  one-half  of  a  calendar  month.  This 
extra  Interviewing  will  only  be  conducted  for 
the  time  period  that  the  Joint  residence  1s 
maintained.  Only  the  original  cross-section  1s 
followed  through  moves. 

II.      DEFINING  THE   LONGITUDINAL   UNIVERSE  OF 
PERSONS  FOR  SIPP 

The  SIPP  universe  at  the  beginning  of  any 
panel  1s  persons  who  are  members  of  the 
civilian  non-1nst1tut1onal  population,  and 
members  of  the  military  not  living  1n  barracks 
on  bases.  Defining  the  longitudinal  universe 
1s  somewhat  more  complicated.  We  begin  by 
defining  the  possible  ways  persons  can  enter 
and  exit  this  universe.  Next  we  discuss  the 
relationship  between  the  cross-sectional 
universes  and  the  longitudinal  universe.  The 
third  topic  of  this  section  addresses  the 
definition  of  table  universes,  and  a  discussion 
of  calculating  annual  Income  for  persons  1n  the 
longitudinal  universe. 

There  are  two  ways  persons  can  enter  the 
SIPP  universe:  1)  persons  can  move  from 
overseas  (Immigrate  or  return).  Institutions, 
or  from  military  barracks;  2)  persons  can  be 
born  to  members  of  the  universe. 

Similarly,  there  are  two  methods  of  exiting 
the  universe;  1)  moving  overseas,  to  an 
Institution,  or  to  military  barracks  2)  dying. 
Given  these  conditions  of  entering  and  exiting 
the  universe,  and  a  definition  of  the  Initial 
universe,  we  can  define  the  universe  at  any 
subsequent  point  1n  time,  and  the  means  by 
which  the  universe  grows  and  diminishes  over 
time.  The  next  problem  1s  to  make  the  transi- 
tion from  the  cross-sectional  universes  to  a 
single  longitudinal   universe. 

There  are  three  methods  of  defining  a 
longitudinal  universe:  1)  the  composition  can 
be  fixed  at  some  point  1n  time;  2)  the  universe 
may  be  defined  as  the  union  of  some  set  of 
cross-sectional  universes;  and  3)  the  universe 
may  be  defined  as  the  Intersection  of  some  set 
of  cross-sectional  universes. 

A  longitudinal  universe  may  be  defined  at  a 
given  point  1n  time.  For  example,  we  can  take 
the  civilian  non1nst1tut1onal  population  at  the 
time  the  sample  1s  drawn,  at  the  midpoint  of 
the  panel  duration,  or  at  the  end  of  the  panel 
to  define  the  universe  of  Interest.  Of  course, 
the  time  point  chosen  could  be  any  time  point 
within  the  duration  of  the  panel.  This  rather 
narrow  definition  of  the  universe  has  an 
advantage  1n  Its  simplicity,  but  also  several 
disadvantages.  Dependent  on  the  chosen  point 
1n  time,  this  definition  produces  a  strictly 
declining  population,  a  first  Increasing  and 
then    decreasing    population,    or    a    strictly 
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Increasing  population.  In  the  first  case  all 
entrants  are  excluded  from  the  longitudinal 
universe,  and  only  exits  are  allowed  to  alter 
the  universe.  In  the  second  case,  entry  1s 
allowed  and  exit  1s  denied  until  the  midpoint, 
when  the  situation  reverses.  In  the  last  case, 
all  those  who  exit  during  the  panel  are 
excluded  from  the  longitudinal  universe  and 
only  entries  are  allowed  to  alter  the  universe. 
In  addition,  1t  1s  difficult  to  argue  why  one 
point  or  another  should  be  chosen  as  the  point 
1n  time  to  define  the  universe,  and  for  some 
purposes  you  may  need  a  different  point  than 
the  one  originally  chosen. 

The  next  two  definitions  build  from  the 
above  idea  that  a  universe  may  be  defined  at 
any  point  during  the  panel.  Let  us  assume  then 
a  set  of  universes  each  defined  at  a  different 
point  1n  time.  To  further  simplify  discussion, 
let  us  assume  a  set  of  twelve  monthly  universes 
defined  at  the  midpoint  of  each  month.  The  two 
options  are  to  use  either  the  union  or  the 
Intersection  of  these  sets. 

Consider  first  the  union  of  sets.  The  union 
of  these  monthly  universes  1s  all  persons  who 
were  at  some  point  during  the  year  members  of 
the  civilian  non1nst1tut1onal  population.  In 
other  words,  all  members  of  the  target  popu1*- 
tlon  plus  all  persons  who  enter  or  exit  during 
the  year  are  Included  1n  the  union  of  sets 
definition.  This  1s  the  most  inclusive  of  the 
universe  definitions  offered  here,  and  the  one 
which  best  captures  the  dynamic  characteristics 
of  the  population.  Some  of  the  disadvantages 
of  this  type  of  definition  will  be  raised  below 
1n  the  discussion  of  tabulations  and  table 
universes. 

An  alternative  to  the  union  of  sets  1s  the 
Intersection  of  the  set  of  twelve  monthly 
cross-sectional  universes.  Here  we  Include  1n 
the  longitudinal  universe  only  those  persons 
who  were  members  of  all  of  the  cross-sectional 
universes.  In  other  words,  only  those  persons 
who  were  members  of  the  civilian  nonlnstltu- 
tlonal  population  or  the  special  military 
categories  on  the  fifteenth  of  each  of  the 
twelve  months.  This  definition  1s  even  more 
restricted  than  the  po1nt-1n-t1me  definition. 
This  Intersection  of  sets  definition  produces  a 
static  population.  That  1s  to  say  there  1s  no 
entering  or  exiting  allowed. 

Of  the  three  longitudinal  definitions 
offered  here,  only  the  union  of  sets  Incorpo- 
rates the  dynamic  qualities  that  are  Inherent 
1n  a  longitudinal  process. 

That  would  seem  to  make  It  the  logical 
choice;  however,  this  1s  also  the  definition 
that  produces  the  most  complications  when 
tabulating  data.  Consider,  for  example, 
tabulating  marital  status  at  the  beginning  of 
the  year  with  marital  status  at  the  end  of  the 
year.  There  1s  no  place  1n  such  a  table  for 
persons  who  were  1n  universe  at  one  point  1n 
time,  and  not  1n  the  universe  at  the  other 
point  1n  time.  For  the  union  of  sets  defini- 
tion there  Is  a  need  for  both  a  column  and  a 
row  for  persons  not  1n  the  universe  at  time  1 
or  not  1n  universe  at  time  2.  For  those 
definitions  that  allow  exiting  only  a  column 
for  persons  not  1n  the  universe  at  time  2  1s 


necessary  as  long  as  the  beginning  point  of  the 
universe  and  the  tables  are  the  same. 

Similar  problems  arise  1n  computing  annual 
Income.  Aggregating  across  months  1s  simple, 
but  1t  1s  not  clear  how  to  compare  Income 
amounts  for  full  year  and  part  year  persons. 
That  1s  simply  to  say  that  a  $6,000  Income  for 
6  months  and  a  $6,000  Income  for  12  months  are 
not  the  same. 
III.  INITIAL  MEIGHTING 

For  SIPP,  as  for  ISDP,  a  cross-section  of 
the  population  will  be  followed  for  a  period  of 
time.  Oata  will  also  be  collected  on  the 
people  that  the  original  cross-section  live 
with.  The  original  Idea  was  that  only  the  data 
on  the  people  1n  the  original  cross-section 
would  be  used  1n  person  longitudinal  tabula- 
tions; the  data  on  the  other  people  would  be 
used  only  to  provide  the  "household  experience" 
of  the  original  cross-section.  He  are  now 
reexamining  that  Idea.  The  data  on  the  other 
people  can  be  used  to  better  understand  the 
experience  of  new  entrants  to  the  SIPP  uni- 
verse. Furthermore,  there  are  ways  to  use 
these  data  more  Intensively  to  gain  valuable 
variance  reductions.  Unfortunately,  these 
procedures  require  strong  assumptions  for 
unblasedness.  In  the  following  sections,  we 
explore  the  trade-off.  We  first  discuss 
whether  the  data  on  the  other  people  should  be 
used.  We  then  discuss  how  to  construct 
weighting  procedures  that  use  these  data  more 
or  less  Intensively. 
A.  Variance  Reduction  Versus  Bias  Control. 

Let  us  first  define  some  terms  and  clarify 
the  type  of  parameters  to  be  estimated.  We 
divide  the  sample  people  Into  three  groups. 
A  person  1s  an  original  sample  person  1f 
he/she  1s  a  member  of  the  original  cross- 
sectlon.1  A  person  1s  an  associated  sample 
person  1f  he/she  was  a  member  of  the 
eligible  population  at  the  time  the 
cross-section  was  selected  but  happened  not 
to  be  selected.  Anyone  else  1s  an  addi- 
tional sample  person.  This  last  group 
consists  of  recent  discharges  from  Insti- 
tutions, new  Immigrants,  and  people  moving 
out  of  military  barracks.  The  type  of 
parameter  to  be  estimated  1s  the  frequency 
of  some  pattern  of  labor  force  participa- 
tion, program  participation.  Income 
receipt,  etcetera,  by  demographic  charac- 
teristics, housing  characteristic,  geogra- 
phical unit,  educational  background, 
etcetera.  A  simple  example  1s  the  fre- 
quency of  women  who  were  receiving  public 
assistance  1n  January  1984  but  were  not 


A  person  1n  original  cross-section  of 
households  who  was  15  years  old  or  older  at 
the  time  of  the  first  Interview  1s  defi- 
nitely an  original  sample  person.  Twelve, 
thirteen,  and  fourteen  year  old  children 
are  more  difficult  to  classify.  At  first, 
no  questionnaires  are  filled  out  for  them 
and  they  are  not  followed  1n  the  rare  event 
of  an  unaccompanied  move.  However,  after 
they  turn  15,  they  are  treated  the  same  as 
any  other  original  sample  person.  We  will 
treat  them  here  as  original  sample  people. 
Children  eleven  or  younger  are  not  classi- 
fied at  all. 


receiving  1t  1n  December  1984. 

The  original  Idea  was  to  estimate 
parameters  like  this  one  by  summing  the 
weights  of  all  original  sample  persons  with 
the  desired  characteristics.  Data  on 
associated  and  additional  sample  people  are 
needed  only  to  classify  original  sample 
people  with  respect  to  household  character- 
istics; for  example,  was  the  original 
sample  person  living  In  a  household  1n 
which  at  least  one  member  received  social 
security?  Given  this  scheme,  no  data  are 
needed  on  associated  or  additional  sample 
people  for  the  period  that  they  don't 
reside  with  original  sample  people.  Hence, 
we  do  not  follow  associated  or  additional 
sample  people  1f  they  separate  from 
original  sample  people.  Clearly  then,  the 
data  on  associated  and  additional  sample 
people  are  frequently  Incomplete. 

Despite  this  Incompleteness,  we  are  now 
considering  ways  to  squeeze  more  Informa- 
tion out  of  this  data.  The  first  way  1s  to 
provide  estimates  for  the  "union"  universe 
using  the  data  on  additional  sample.  The 
second  way  1s  to  use  the  data  on  both  types 
to  reduce  variances.  To  begin  the  argument 
for  this  second  use,  we  first  point  out 
that  for  shorter  time  periods  these  data 
are  frequently  either  complete  or  nonexis- 
tent. (Throughout  this  section,  by 
complete  we  mean  complete  Ignoring  nonre- 
sponse.)  This  1s  always  true  for  1  month 
periods,  usually  true  for  3  month  periods, 
and  frequently  true  for  12  month  periods. 
For  example,  suppose  that  Ruth  1s  an 
original  sample  person  Interviewed  1n 
October  1983.  In  November,  she  marries 
Jack,  who  was  1n  the  October  SIPP  universe. 
They  stay  together  at  least  through  April 
1985.  Then  Jack  1s  an  associated  sample 
person  on  whom  we  have  complete  1984  data. 
Alternatively,  suppose  that  Jack  was  living 
1n  a  military  barracks  1n  October  1983. 
Then  he  1s  an  additional  sample  person  on 
whom  we  have  complete  1984  data.  There 
will  obviously  be  many  more  cases  1n  these 
complete  categories  for  1985  data. 
Furthermore,  there  will  be  many  cases  where 
we  are  only  missing  one  or  two  months  of 
data. 

Intuitively,  1t  seems  wasteful  to  give 
zero  weights  to  these  cases  with  complete 
or  almost  complete  data,  as  originally 
Intended.  On  the  other  hand,  zero  weights 
must  be  assigned  to  the  seriously  Incom- 
plete cases  to  avoid  large-scale  Imputa- 
tion. One  possible  solution  1s  obtained  by 
Initially  assigning  strictly  positive 
weights  to  all  cases,  Including  these  that 
are  Incomplete  due  to  field  procedures,  and 
then  treating  the  Incomplete  cases  as  1f 
they  were  caused  by  non-response.  Imputa- 
tion would  be  used  for  the  almost  complete 
cases.  Note  then  that  the  seriously 
Incomplete  cases  would  have  zero  weights, 
while  the  other  cases  would  have  positive 
weights.  If  enough  data  has  been  collected 
on  the  associated  and  additional  sample 
people  to  correctly  model  the  probability 
of  this  type  of  nonresponse,  then  we  would 


still  have  unbiased  estimators. 

An  example  of  the  type  of  model  required 
1s  that  starting  from  a  given  social- 
economic  stratum,  the  new  economic  situa- 
tion of  a  male  divorcee  does  not  depend  on 
whether  he  or  his  ex-spouse  was  the 
original  sample  person.  Here  we  stress 
that  If  a  person  has  responded  to  even  a 
single  wave  of  SIPP,  then  we  have  an 
extraordinary  wealth  of  data  available  for 
modeling. 
Future  Study 

Of  course,  we  will  never  know  for  certain 
whether  such  a  model  1s  correct.  There  1s 
a  risk  of  biasing  the  estimators,  and  as  a 
rule  the  Bureau  1s  willing  to  risk  biases 
for  decreases  1n  variance  only  1f  there  1s 
some  evidence  that  the  bias  squared  1s 
substantially  less  than  the  variance 
decrease.  Our  plans  at  this  time  are  not 
well  formulated.  A  reasonable  first  step 
1s  to  quantify  for  each  proposed  weighting 
procedure  the  frequency  of  positively 
weighted  Incomplete  cases  by  the  severity 
of  the  Incompleteness.  The  only  source  for 
this  Information  1s  the  ISDP.  We  are 
currently  working  on  ways  to  get  appropri- 
ate tabulations  for  1t. 
Construction  of  Unbiased  Weighting  Proce- 
dures 

Below  we  present  a  very  simple  result  that 
characterizes  a  general  class  of  unbiased 
procedures.  Reflection  on  this  result 
quickly  helps  one  to  understand  that  there 
are  Infinitely  many  unbiased  procedures. 
Most  of  them  are  totally  Inappropriate,  but 
1t  1s  very  possible  that  better  and 
radically  different  weighting  procedures 
exist  than  have  yet  been  conceived. 

N 
Let  x  *  J  xi  be  the  parameter  of  Interest 

1 
to  be  estimated  where  x-j  1s  the  value  of 
the  characteristic  for  the  1th  unit.   Let 
w«  be  a  random  variable  associated  with  the 
1th  unit  such  that  E(wi)  *  1. 


E(Y)  «  E(Iw  x  )  «IE(w  )x  =Xx  *  x. 
1  1  1    1   1  1   1  1 

If  the  probability  of  selection  1s  known 
for    all     units,     1t    1s    common    to    take 


Inverse  probability  of  selection  1f 
1th  unit  1s  1n  sample; 


0  otherwise. 

This  definition  of  w^  1s  not,  however, 
necessary.  In  this  case  1t  1s  Impossible 
since  the  probabilities  are  unknown. 

Each  sample  person  has  a  cross-sectional 
weight  for  every  month  that  they  are  In  the 
universe.  These  cross-sectional  weights 
have  expected  value  of  unity,  are  strictly 
positive  for  the  months  that  the  person  1s 
1n  sample,  and  are  zero  for  the  months  that 
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Interval  but  after  the  mid  date  are 
assigned  their  respective  cross-sectional 
weights  as  of  the  date  they  enter  1t,  as 
In  Procedure  1  and  2.  Persons  who  leave 
the  universe  before  the  "mid"  date  are 
assigned  their  respective  cross-sectional 
weights  as  of  the  date  they  leave  It. 

Procedure    4.       Average    Cross-Sectional 

Weight  (ACS) 


Height 
Each 


Each    person    redeves    a    longitudinal 
weight  valid  for  a  specific  time  Inter- 
val.    Persons  that  remain  In  the  universe 
throughout  the  Interval  are  assigned  the 
average    of    their    respective    monthly 
cross-sectional   weights.     Persons  that 
enter  or   leave  the  universe  are  assigned 
the  average  of  their  respective  monthly 
cross-sectional  weights  for  the  months 
they  were  1n  the  universe  during  the  time 
Interval.     Positive  weights  are  assigned 
to  all    sample  persons.     A  more  formal 
definition  1s  given  below. 
Let  U^   *  number  of  months  the  1th  person 
was   1n  the  universe  during  the 
specified  time  Interval 
Let  C\   *  sum  of  the  monthly  cross-sec- 
tional weights  of  the  1tn  person 
1n  the  specified  time  Interval 
Then  the  person   longitudinal   weight  1s 
Cl/UV 
C.     Comparison  of  Procedures 

In  this  section  we  describe  1n  detail  the 
types  of  complete  and  Incomplete  cases  that 
are  used  by  each  procedure.     First,  we  need 
to  define  some  notation.     Let 
tp  ■  the  first  month   that  a  person  1s  1n 

the  universe, 
t£  ■  the  last  month  that  a  person  1s  1n  the 

universe, 
t\  ■  the  first  month  that  a  person  1s  1n 

sample, 
t2  *  the   last  month  that  a  person  1s   1n 

sample, 
tm   *    the   mid-month    of    the    Interval    of 

Interest. 
The  description  1s  given  1n  Table  1.  The 
first  14  cases  comprise  the  "Intersection" 
universe.  The  remaining  32  cases  fill  out 
the  "union"  universe.  Each  case  1s  marked 
as  having  complete,  partial  or  no  data  for 
the  interval  of  Interest.  Of  course,  all 
of  this  1s  assuming  perfect  response.  The 
only  type  of  mlsslngness  that  we  are 
discussing  here  1s  that  caused  by  opera- 
tional procedures.  On  the  right,  there  1s 
a  column  for  each  procedure  with  an  "X"  1f 
the  procedure  uses  the  case. 

The  entry  date  procedure  uses  the 
perfect  cases  1,15,17,  and  18,  but  does  not 
use  the  perfect  cases  2  and  16;  the  partial 
cases  3,5,  and  19-27;  and  cases  12  and  44 
for  which  no  relevant  data  exists.  The 
beginning  date  of  Interval  and  mid  date  of 
Interval  procedures  both  use  all  of  the 
perfect  cases,  more  of  the  partial  cases 
and  none  of  the  completely  missing  cases. 
He  thus  think  that  these  two  procedures 
will  tend  to  yield  smaller  variances  than 
the  entry  date  procedure  with  possibly  some 
small  Increase  in  the  risk  of  bias.  The 
average  cross-sectional  procedure  1s  the 


the  person  1s  not  In  sample.  By  choosing 
the  longitudinal  weight  to  be  the  cross- 
sectional  weight  at  a  particular  time  or 
the  average  of  the  cross-sectional  weights 
at  several  points  1n  time,  we  can  construct 
longitudinal  weighting  procedures  that  use 
different  subsets  for  the  overall  data  set. 
In  this  section  we  present  four  longitu- 
dinal weighting  procedures  for  computing 
unbiased  estimates  for  persons.  They  are 
all  presented  1n  terms  of  the  "union" 
universe,  but  they  can  be  easily  modified 
for  the  "Intersection"  universe  by  assign- 
ing a  zero  weight  to  any  person  who  1s  not 
1n  every  one  of  the  12  cross-sectional 
universes.  In  Section  III.C  we  compare  the 
procedures  with  respect  to  the  use  of  data 
collected  on  associated  sample  persons  and 
additional  sample  persons.  In  the  full 
paper  there  1s  an  additional  section  with 
examples  of  the  application  of  these 
procedures. 

Procedure  1.  Entry  Date  Height  (ED) 
Each  person  receives  a  single  longitudi- 
nal weight  for  any  time  Interval  that 
contains  at  least  part  of  the  period  for 
which  the  person  was  1n  the  universe, 
namely  the  cross-sectional  weight  for  the 
person  at  his/her  entry  date  Into  the 
universe.  For  all  original  and  associat- 
ed sample  persons,  the  entry  date  Into 
the  universe  1s  the  start  of  the  panel, 
so  their  longitudinal  weights  are  their 
Have  1  cross-sectional  weights.  For  those 
who  enter  the  universe  after  Have  1, 
(additional  sample  persons),  the  longitu- 
dinal weight  1s  the  cross-sectional 
weight  of  the  household,  of  which  they 
are  a  member,  as  of  the  date  they  enter 
the  universe.  If  the  cross-sectional 
weight  of  the  household  at  that  date  1s 
zero,  then  the  additional  sample  person's 
longitudinal  weight  1s  zero. 
Procedure  2.  Beginning  Date  of  Time 
Interval  Height  (BO I) 
Each  person  receives  a  longitudinal 
weight  valid  for  all  time  Intervals  with 
the  same  beginning  date.  Persons  1n  the 
universe  at  the  beginning  date  of  the 
time  Interval  are  assigned  their  respec- 
tive cross-sectional  weights  for  that 
date.  Persons  that  enter  the  universe 
during  the  time  Interval  are  assigned 
their  respective  cross-sectional  weights 
as  of  the  date  they  enter  1t,  as  1n 
Procedure  1. 
Procedure  3.  "H1d"  Date  of  the  Time 
Interval  Height  (MDI ) 
This  procedure  1s  similar  to  Procedure  2. 
Each  person  receives  a  longitudinal 
weight  valid  for  a  specific  time  Inter- 
val. Persons  1n  the  universe  at  the 
Mm1d"  date  of  the  time  Interval  are 
assigned  their  respective  cross-sectional 
weights  at  that  date.  The  difference  1s 
that  Instead  of  the  person  longitudinal 
weights  being  determined  at  the  beginning 
date  of  the  time  Interval,  these  weights 
are  determined  at  some  predeslgnated  date 
within  the  time  Interval.  Persons  that 
enter    the    universe    during    the    time 
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most  aggressive  in  utilizing  partial  data. 
It  uses  all  the  perfect  and  partial  cases 
and  none  of  the  completely  missing  cases. 
Also  note  that  1t  assigns  smaller  weights, 
1n  general,  to  the  partial  cases  than  the 
perfect  cases.  Me  think  1t  will  tend  to 
yield  the  smallest  variances  with  the 
greatest  risk  of  bias. 
IV.  CONTROLS 

We  are  currently  considering  the  adjustment  of 
SIPP  longitudinal  weights  so  as  to  achieve  the 
variance  reductions  associated  with  ratio 
estimation  while  also  causing  agreement  with 
SIPP  cross-sectional  controls  on  a  monthly 
basis;  I.e.,  1n  addition  to  simple  undercover- 
age  adjustments  we  are  considering  the  possi- 
bility of  forcing  the  sum  of  the  longitudinal 
weights  of  all  persons  In  the  universe  1n  a 
given  month  to  equal  the  cross-sectional 
population  control  for  that  month.  Since 
longitudinal  weights  are  fixed  over  time  while 
the  universe  fluctuates  over  time,  such 
agreement  will  not  occur  unless  proper  steps 
are  taken  to  ensure  1t.  We  are  also  consider 
1ng  adjustments  to  force  spouses  to  have  equal 
longitudinal  weights.  We  are  considering  these 
two  possibilities  1n  order  to  enhance  the  face 
validity  of  the  survey  at  the  least  possible 
cost  of  reduced  precision. 
Objectives 

The  primary  reason  for  ratio  adjustment  of 
longitudinal  weights  1s  to  reduce  variances  of 
longitudinal  weights  by  ensuring  representa- 
tiveness with  respect  to  demographic  variables 
which  are  highly  correlated  with  the  variables 
to  be  measured.  {This  1s  frequently  referred  to 
as  post-strat1f1cat1on. )  To  the  extent  that  1t 
corrects  for  differential  under cover age,  1t  1s 
also  hoped  that  bias  1s  reduced  by  ratio 
adjustment. 

A  reasonably  good  adjustment  1s  to  propor- 
tionately adjust  the  weights  of  persons  by 
demographic  type  1n  a  specified  month  so  that 
the  weighted  counts  agree  with  Independent 
population  estimates  by  demographic  type  for 
that  month.  Persons  not  1n  sample  1n  the 
chosen  month  are  assigned  the  factor  for  their 
demographic  type.  This  approach  operates  under 
the  assumption  that  the  degree  to  which  the 
sample  represents  each  demographic  type  1s  not 
highly  variable  over  time.  This  adjustment 
does  not  adjust  weights  to  monthly  controls 
other  than  those  for  the  chosen  month.  Another 
approach  1s  to  make  the  adjustment  for  all 
persons  for  each  of  the  12  data  months,  then 
assign  to  a  person  the  average  of  the  12 
factors  for  his/her  cell.  Such  an  adjustment 
would  tend  to  be  Influenced  less  by  the 
vagaries  of  sample  selection. 

Addressed  here  1s  the  more  complex  problem 
of  adjusting  weights  for  dlsproportlonal 
representation  1n  a  manner  such  that  consis- 
tency with  cross-sectional  controls  1s  achieved 
for  each  month.  This  problem  has  a  multitude 
of  solutions.  However,  the  solution  we  seek 
should  be  the  one  which  provides  the  greatest 
variance  reduction.  One  possible  solution  1s  to 
first  adjust  weights  as  outlined  1n  the  above 
paragraph,  then  further  adjust  them  so  that  the 
desired  Monthly  consistency  1s  achieved  while 
Minimizing  the  amount  by  which  weights  are 


further  adjusted.  This  can  be  done  with 
Lagrange  multipliers  or  with  linear  programm- 
ing. This  approach  preserves  the  benefits  of 
the  Initial  adjustment  by  demographic  vari- 
ables provided  that  this  second  adjustment 
causes  relatively  small  changes  1n  weights. 
Research  1s  needed  to  determine  whether  the 
second  adjustment  would  Indeed  cause  only  small 
changes . 

A  further  refinement  would  be  to  adjust  so 
that  spouses  have  equal  weights.  Naturally, 
persons  undergo  changes  1n  marital  status 
during  the  year;  some  persons  may  have  more 
than  one  spouse  over  a  one  year  period.  Define 
a  "marriage  group"  to  be  a  group  of  persons  1n 
the  SIPP  sample,  each  of  whom  has  been  or  1s 
married  to  at  least  one  other  person  1n  the 
group  during  the  data  year.  It  1s  possible  to 
perform  an  adjustment  so  that  all  persons  1n  a 
given  marriage  group  have  equal  weights.  This 
last  adjustment  would  cause  slight  disagree- 
ments between  longitudinal  population  estimates 
and  monthly  controls;  1t  appears  likely  that 
such  disagreements  could  be  made  arbitrarily 
small  by  1terat1vely  repeating  the  two  adjust- 
ment steps  for  consistency  with  cross-sectional 
estimates  and  consistency  within  marriage 
groups.  For  more  details,  see  our  full  paper. 
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Table  1.  Case  Utilization  by  Procedure 
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LONGITUDINAL  FAMILY  AND  HOUSEHOLD  ESTIMATION  IN  SIPP 
Lawrence  R.  Ernst,  David  L.  Hubble,  and  David  R.  Judkins,  Bureau  of  the  Census 


1.  INTRODUCTION 

Many  types  of  statistics  will  be  produced  by 
the  Survey  of  Income  and  Program  Participation 
(SIPP),  but  there  is  one  type  that  was  the 
driving  force  behind  the  unique  design  of  the 
survey.  To  be  fully  successful,  SIPP  must  tell 
us  what  happens  to  households  over  the  course 
of  time.  From  it  we  must  obtain  estimates  of 
the  patterns  of  income  receipt,  program  partic- 
ipation, and  labor  force  participation  at  the 
household  and  family  level  by  a  host  of  other 
characteristics.  Of  particular  interest  are 
parameters  such  as  total  annual  household 
income  and  the  number  of  families  that  have 
stopped  drawing  food  stamps  by  demographic 
characteristics. 

Before  estimates  can  be  produced,  a  decision 
must  be  made  on  the  definition  of  a  longitudi- 
nal household  to  be  used  in  this  survey.  (To 
simplify  the  presentation,  we  will  concentrate 
our  discussion  on  longitudinal  households  as 
opposed  to  longitudinal  families.  However, 
parallel  longitudinal  estimation  procedures 
can  readily  be  developed  for  families).  It 
often  happens  that  the  occupants  of  several 
housing  units  move  and  regroup.  We  need  to 
know  which,  if  any,  of  the  resulting  households 
are  to  be  considered  continuations  of  the  pre- 
vious households.  Many  definitions  have  been 
proposed,  but  final  agreement  has  thus  far  not 
been  achieved.  Also  decisions  have  yet  to  be 
made  on  whether  households  that  form  or  dis- 
solve during  a  time  interval  of  interest  are  to 
be  considered  as  part  of  the  universe  for  esti- 
mation purposes.  Because  of  the  absence  of 
agreement  in  these  areas,  several  proposed  def- 
inition and  universe  combinations  will  be  con- 
sidered in  this  paper.  They  are  listed  in 
Section  2.  Also  because  of  this  absence  of 
agreement,  the  major  aim  of  this  paper  will  be 
simply  to  compare  several  possible  longitudinal 
household  estimation  procedures  and  present 
criteria  for  choosing  among  them,  without 
attempting  to  reach  a  conclusion  on  a  preferred 
procedure. 

We  foresee  several  steps  in  the  process  of 
producing  longitudinal  households  estimates. 
The  focus  in  this  paper  1s  the  first  step,  the 
production  of  weights  that  would  yield  unbiased 
estimates  assuming  there  are  no  data  that  are 
missing  or  in  error,  and  that  the  frame  cover- 
age is  perfect.  Several  procedures  for  obtain- 
ing such  weights  will  be  presented  1n  Section 
3.  Choosing  among  these  procedures  1s  compli- 
cated by  the  fact  that  even  assuming  perfect 
response,  data  needed  to  produce  unbiased 
estimates  will  be  missing  for  some  households 
because  they  are  not  collected  with  the  current 
field  procedures.  This  difficulty  is  prini- 
pally  due  to  the  fact  that,  except  for  a  few 
household  definitions,  all  unbiased  proce- 
dures assign  positive  weights  to  some  longi- 
tudinal households  for  time  periods  when  they 
are  not  in  sample.  The  severity  of  this  prob- 
lem and  the  extent  to  which  it  is  correctable 
in  the  future  by  changing  field  proce- 
dures or  by  modeling  the  missing  data, 
vary  by  procedure.   This  problem,  along  with 


descriptions  of  other  important  features,  both 
positive  and  negative,  that  estimation  proce- 
dures may  possess  is  presented  1n  Section  4. 
Finally,  in  Section  5  a  detailed  comparison 
of  the  features  of  the  estimation  procedures 
under  consideration  1n  this  paper  is  presented. 

It  is  assumed  in  this  paper  the  reader  has 
a  basic  knowledge  of  SIPP,  Including  the  design 
of  this  survey.  Nelson,  McMillen,  and  Kasprzyk 
(1984)  provides  this  information. 

Portions  of  the  original  paper,  principally 
an  examples  section  and  a  section  on  adjust- 
ments to  the  unbiased  weights,  are  omitted 
here  due  to  lack  of  space.  The  complete  paper 
1s  available  from  the  authors. 
2.  LONGITUDINAL  HOUSEHOLD  DEFINITIONS 

In  this  section  three  possible  longitudinal 
household  definitions  are  presented  to,  illus- 
trate the  longitudinal  weighting  procedures 
that  will  be  described  in  the  next  section. 
(A  fourth  definition,  known  as  the  Shared 
Experience  Definition  was  included  in  the 
original  paper,  but  omitted  here  due  to  lack 
of  space.  In  terms  of  the  properties  dis- 
cussed in  Section  5  it  is  identical  to  the 
Reciprocal  Majority  Definition  that  is  included 
here.)  A  thorough  discussion  of  longitudinal 
household  definitions  is  presented  in  McMillen 
and  Herriot  (1984).  In  addition,  several  other 
terms  will  be  defined,  including  the  longitud- 
inal household  universes  considered  in  this 
paper. 

Since  household  composition  and  data  for 
SIPP  are  obtained  on  a  monthly  basis,  each 
of  the  definitions  to  be  presented  will  be 
in  terms  of  household  continuity  from  one 
month  to  the  following  month.  A  longitud- 
inal household  over  a  time  Interval  of 
n  (>?.)  months  is  then  defined  to  be  one 
which  is  continuous  for  each  of  the  n-1  corre- 
sponding pairs  of  consecutive  months.  (It  has 
not  yet  been  decided  if  this  approach  will 
actually  be  used  in  SIPP.) 

For  each  of  the  definitions  below  the  condi- 
tions for  which  household  B  at  month  t+1  is  the 
continuation  of  household  A  at  month  t  are 
stated.  One  condition  that  we  require  that 
all  the  definitions  share  is  that  A  and  B  are 
either  both  family  households  or  both  non- 
family  households.  The  other  conditions  are: 
No  Change  Definition  (NC).  A  and  B  have  the 
same  household  members. 

Same  Householder  Definition  (SH).  A  and  B  have 
the  same  householder.  As  an  alternative, 
householder  could  be  replaced  by  principal 
person  in  this  definition  without  altering 
any  of  the  statements  made  about  it  in  sub- 
sequent sections,  provided  the  final  estima- 
tion procedure  in  Section  3  is  also  modified 
accordingly.  (The  householder  of  a  house- 
hold is,  roughly,  the  person  who  owns  or  rents 
the  housing  unit.  The  principal  person  is 
the  wife  In  a  married-couple  household,  and 
the  householder  in  all  other  households.) 
Reciprocal  Majority  Definition  (RM).  The  major- 
1ty  of  Individuals  who  are  both  household 
members  of  A  at  time  t  and  in  the  universe  at 
time  t+1  are  members  of  B  at  time  t+1,  and  the 
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majority  of  individuals  who  are  both  house- 
hold members  of  B  at  time  t+1  and  in  the  uni- 
verse at  time  t  are  members  of  A  at  time 
t.  (This  type  of  longitudinal  definition  was 
originally  developed  by  Dicker  and  Casady  (1982) 
for  use  in  the  National  Medical  Care  Utiliza- 
tion and   Expenditure  Survey  (NMCUES).) 

We  will  now  clarify  several  other  terms. 

A  household  is  said  to  be  in  existence  over 
a  time  interval  of  n>2  months  if  it  is 
longitudinal  over  that  time  interval.  Its 
period  of  existence  is  the  longest  such  time 
interval.  In  the  case  of  a  household  which  is 
defined  cross-sectional ly  for  a  month  t,  but 
is  not  longitudinal  over  either  of  the  two 
month  intervals  containing  t,  then  the  period 
of  existence  of  the  household  is  defined  to  be 
one  month. 

If  t\  and  tg  are  any  pair  of  months,  and 
longitudinal  estimates  are  to  be  made  over  the 
interval  [t^ ,  tgL  then  the  following  two  pos- 
sibilities will  be  considered  in  subsequent 
sections  for  the  universe  of  households  for 
which  estimates  will  be  produced. 
Restricted  Universe(R).  The  set  of  all  house- 
holds in  existence  over  the  entire  interval 
[ti,  t2]. 

Unrestricted^  Universe(U).  The  set  of  all 
household  in"  existence  tor  one  or  more  months 
in  [ti,  t?]. 

Each  sample  panel  is  interviewed  eight  times. 
Each  of  the  eight  rounds  of  interviews  takes 
four  consecutive  months  to  complete  and  is 
known  as  a  wave. 

Finally,  we  define  an  original  sample  person 
to  be  a  person  that  was  in  sample  during  the 
first  wave  and  will  be  at  least  15  years  of  age 
by  the  end  of  the  panel . 
3.  UNBIASED  WEIGHTING  PROCEDURES 

In  this  section  we  present  five  weighting 
procedures  for  computing  estimates  of  totals 
or  proportions  for  longitudinal  households 
that  would  be  unbiased  in  the  sense  that  the 
expected  value  of  the  estimator  over  all  pos- 
sible samples  is  the  parameter  of  interest 
assuming  no  data  are  missing  or  in  error,  and 
perfect  frame  coverage.  Modifications  and 
adjustments  of  these  estimation  procedures 
necessary  because  of  the  unrealistic  nature 
of  these  assumptions  are  considered  in  the 
original  paper,  but  are  omitted  here  due  to 
lack  of  space.  Except  for  the  Continuous 
Household  Members  procedure,  which  will  only 
be  applied  to  the  restricted  universe,  all  the 
procedures  will  be  stated  for  the  unrestricted 
universe.  To  apply  them  to  the  restricted 
universe  simply  zero  weight  each  household 
which  is  not  in  continuous  existence  over  the 
time  interval  of  interest.  Furthermore,  unless 
otherwise  stated,  all  the  procedures  will  be 
applied  to  all  four  longitudinal  definitions 
defined  in  Section  2. 

First  we  will  explain  why  a  common  method 
of  estimation,  weighting  by  the  reciprocal 
of  the  probability  of  selection  1s  not  feasi- 
ble for  our  purposes,  and  hence  the  need  to 
consider 

N 
alternative  procedures.   Let  X  =  I      x^     be  a 

1=1 
parameter  of  interest,  where  x-j  is  the  value  of 


the  characteristic  for  1-th  unit  in  a  popula- 
tion of  size  N.  Typically  in  survey  work,  to 
estimate  X  a  sample  would  be  drawn  in  such  a 
manner  that  the  i-th  unit  has  a  known  positive 
and  X  would 


be  estimated  by  X  =  I     WjX{  ,  (3.1) 

i=l 
where 

1 
wi  =  1 if  the  i-th  unit  1s  in  sample,  (3.2) 


'  0  otherwise. 
Unfortunately  for  household  and  family  estima- 
tion in  SIPP,  both  cross-sectionally  and  longi- 
tudinally, such  an  estimation  approach  is  not 
practical.  For  example,  cross-sectionally  a 
household  is  interviewed  and  used  in  the  esti- 
mation process  for  a  given  month  if  and  only 
if  at  least  one  household  member  is  an  original 
sample  person.  Consequently,  to  use  (3.1)  and 
(3.2)  as  an  estimator  it  would  be  necessary  to 
determine  the  probability  that  at  least  one 
member  of  the  current  household  is  an  original 
sample  person.  It  would  be  operationally 
impossible  to  determine  this  probability,  since 
it  would  first  be  necessary  to  determine  the 
first  wave  households  for  all  current  household 
members  and  then  compute  the  probability  that 
at  least  one  of  these  first  wave  households  was 
selected. 

Fortunately  though,  it  is  not  necessary  that 
w^  satisfy  (3.2)  in  order  that  (3.1)  be  unbi- 
ased. In  fact  if  w,  is  any  random  variable 
associated  with  the  i-th  unit  in  the  population 
satisfying 


E(wi)  =  1, 


(3.3) 


then  (3.1)  is  unbiased,  that  is  E(X)  =  X.  Thus, 
defining  unbiased  longitudinal  household  and 
family  weighting  procedures  reduces  to  defining 
random  variables  w^  satisfying  (3.3). 

Before  we  present  the  longitudinal  weighting 
procedures  we  will  state  what,  for  purposes  of 
this  paper,  a  cross-sectional  household  weight 
is,  since  most  of  longitudinal  weighting  proce- 
dures will  be  defined  in  terms  of  cross-sec- 
tional weights.  The  first  wave  cross-sectional 
weight  for  a  sample  household  is  taken  here  to 
be  the  reciprocal  of  the  probability  of  selec- 
tion. For  all  nonsample  households  in  the  uni- 
verse this  weight  is  defined  to  be  zero.  For 
any  month  after  the  first  wave  a  different  def- 
inition is  necessary  because  of  possible  changes 
in  household  composition.  So,  the  cross-sec- 
tional household  weight  for  any  such  month  is 
defined  to  be  the  mean  of  the  first  wave  cross- 
sectional  household  weights  for  all  persons  in 
the  household  that  month  who  will  be  at  least 
15  years  of  age  by  the  end  of  the  panel  and  who 
were  in  the  universe  during  the  first  wave. 
This  type  of  weighting  procedure  is  currently 
being  used  in  SIPP  to  produce  cross-sectional 
estimates,  hence  the  name.  It  is  readily 
verifiable  that  the  weights  satisfy  (3.3). 

We  also  will  leave  It  to  the  reader  to  veri- 
fy that  the  weights  for  each  of  the  longitudinal 
procedures  to  be  presented  satisfy  (3.3)  and 
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hence  lead  to  unbiased  estimators. 

Beginning  Date  of  Household  Procedure  (BH). 
Each  longitudinal  household  receives  a  single 
weight  valid  for  any  time  interval  that  con- 
tains at  least  part  of  the  period  for  which  the 
household  existed,  namely  the  cross-sectional 
weight  for  the  household  at  the  beginning  date 
of  the  household.  In  particular,  if  there  were 
no  original  sample  persons  in  a  household  at 
Its  beginning  date  then  its  longitudinal 
weight  would  be  zero.  This  approach  to  longi- 
tudinal household  estimation  was  previously 
used  in  the  NMCUES  (Whitmore,  Cox  and  Folsom 
1 982 ) . 

Beginning  Date  of  Time  Interval  Procedure 
(BI).  Each  longitudinal  household  receives  a 
longitudinal  weight  valid  for  all  time  intervals 
with  the  same  beginning  date,  namely  the  cross- 
sectional  weight  for  the  household  at  the  begin- 
ning date  of  the  time  interval.  Longitudinal 
households  that  form  during  the  time  interval 
are  assigned  the  cross-sectional  weight  for  the 
household  at  its  beginning  date,  as  in  the 
preceeding  procedure. 

Continuous  Household  Members  Procedure  (CM). 
The  following  procedure  will  only  b~e  applied 
to  the  restricted  universe,  as  defined  in 
Section  2.  For  any  time  interval  for  which  the 
household  is  in  existence  the  longitudinal 
weight  to  be  assigned  is  determined  by  the  set 
of  persons  that  are  members  of  the  household 
throughout  the  time  interval.  The  longitudinal 
household  weight  is  the  cross-sectional  weight 
that  would  be  assigned  to  a  household  consisting 
of  this  set  of  persons;  that  is,  the  average  of 
the  first  wave  weights  of  these  people.  A  lon- 
gitudinal weight  of  zero  is  assigned  to  the 
household  if  there  are  no  original  sample  per- 
sons who  are  members  throughout  the  time  inter- 
val. The  procedure  is  slightly  biased  because 
a  longitudinal  household  with  no  members  con- 
tinuously present  throughout  a  time  interval 
has  no  chance  of  receiving  a  positive  weight, 
thereby  making  satisfaction  of  (3.3)  impossi- 
ble. Since  we  believe  this  situation  will 
rarely  occur,  at  least  for  the  longitudinal 
household  definitions  considered  here,  we 
expect  this  bias  to  be  very  small. 

Average  Cross-Sectional  Household  Weight 
Procedure  (AW).  Each  longitudinal  household 
receives  a  longitudinal  weight  valid  for  a 
specific  time  interval,  namely  the  average  of 
the  monthly  cross-sectional  weights  for  the 
household  over  the  intersection  of  the  life  of 
the  household  and  the  specified  time  interval. 

Note,  there  are  many  procedures,  like  AW, 
that  entail  the  averaging  of  weights,  both 
household  cross-sectional  weights  and  person 
longitudinal  weights.  We  will  examine  only 
one  of  these  procedures  here,  as  an  example  of 
this  type  of  longitudinal  household  weighting 
procedure. 

Householder  Weight  Procedure  (HW).  The  follow- 
ing  procedure  will  be  applied  only  to  the  No 
Change  and  Same  Householder  Definitions,  since 
it  is  appropriate  only  for  definitions  that 
allow  for  a  single  householder  during  the 
household's  existence.  (Generalizations  of  this 
procedure  which  are  not  so  restricted  1n  their 
applicability  exist  but  will  not  be  considered 
here.)  The  procedure  assigns  a  single  weight 


valid  for  any  time  interval  that  contains  at 
least  part  of  the  period  for  which  the  house- 
hold existed,  namely  the  first  wave  cross- 
sectional  household  weight  of  the  householder's 
first  wave  household.  A  longitudinal  weight 
of  zero  is  assigned  to  the  household  if  the 
householder  was  not  an  original  sample  person. 

As  will  be  seen  in  Section  5,  this  procedure 
is  clearly  the  one  of  choice  when  the  Same 
Householder  Definition  1s  used.  If  that  type 
of  definition  is  used  with  householder  replaced 
by  principal  person  then  a  similar  modification 
of  this  estimation  procedure  with  householder 
replaced  by  principal  person  would  be  appro- 
priate. 
4.  POTENTIAL  ADVANTAGES  AND  DISADVANTAGES 

The  ideal  unbiased  weighting  procedure  would 
provide  a  single  set  of  weights  applicable  to 
any  time  interval,  require  no  more  data  than 
were  collected,  and  possess  the  minimum  vari- 
ance among  all  unbiased  procedures.  Unfortu- 
nately, no  such  procedure  exists.  The  proce- 
dures described  in  Section  3  all  fail  one  or 
more  of  these  three  criteria  to  various  de- 
grees. In  this  section,  we  explain  the  nature 
of  the  failures  without  explicitly  comparing 
the  procedures.   That  is  done  in  Section  5. 

Multiplicity  of  Weights.  Some  procedures 
have  the  advantage  of  assigning  to  each  house- 
hold a  single  weight  which  depends  only  on  con- 
ditions as  of  the  first  reference  month  for  the 
household  and  which  is  valid  for  every  interval 
that  the  household  is  in  the  universe.  Other 
procedures  have  the  disadvantage  of  sometimes 
producing  different  weights  for  the  same  house- 
hold for  different  time  intervals.  (Procedures 
with  this  disadvantage  could  be  modified  so 
that  only  a  single  weight  applies  to  any  time 
interval,  by  computing  for  each  household  the 
weight  appropriate  for  that  procedure  for  the 
unrestricted  universe  and  the  2  1/2  year  time 
interval  corresponding  to  the  life  of  the 
panel.  The  weight  obtained  would  also  be  used 
for  any  smaller  subinterval  for  which  the 
household  is  in  the  universe.  However,  weights 
obtained  in  this  manner  might  not  be  able  to  be 
determined  until  the  end  of  the  life  of  the 
panel.  This  would  make  them  difficult  to  use 
because  we  would  have  to  wait  until  the  last 
data  from  the  panel  were  processed  before 
estimates  could  be  produced  for  any  earlier 
time  period.  In  any  case,  such  weights  would 
often  lead  to  higher  variances  for  short  time 
intervals  than  weights  developed  specifically 
for  the  short  time  intervals.) 

Unavailable  Data  Requirements.  Most  defini- 
tion  and  procedure  combinations  require  data 
from  some  households  for  time  periods  when  the 
household  is  in  existence  but  not  in  sample, 
that  is  for  time  periods  for  which  interviews 
are  not  conducted  for  the  household  because  no 
original  sample  people  are  members  of  the 
household.  This  needed  data  could  be  informa- 
tion for  determining  proper  longitudinal 
weights  or  subject-matter  information  for  use 
in  tabulating  the  estimates.  Some  of  this 
Information  is  not  collected  for  the  1984  panel 
of  SI°p  beca;",*  of  the  current  operational 
procedures.  T^is  is  a  consequence  of  the  fact 
that  agreement  has  not  been  reached  on  the 
longitudinal  household  definition  to  be  used 
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in  SIPP.  In  this  vacuum,  operational  proce- 
dures were  determined  mainly  by  considerations 
of  difficulty  and  cost.  Once  a  definition 
has  been  agreed  on,  depending  on  the  nature  of 
the  unavailable  data,  it  might  be  possible  to 
change  operational  procedures  for  future  SIPP 
panels  so  that  the  required  data  are  collected. 
To  understand  the  problem  with  current  opera- 
tional procedures,  consider  the  following  sit- 
uation. A  household  is  longitudinal"  from 
month  tg  to  tf.  Original  sample  people  are 
part  of  the  longitudinal  household  only  from 
month  t}  to  t2 .  If  tR<t1,  then  some  prior 
information  may  be  unavailable.  Revised  opera- 
tional procedures  to  obtain  this  information 
might  involve  retrospective  questions,  longer 
reference  periods  or  proxy  data  on  anyone  who 
left  the  household  before  the  first  Interview. 
If  t2<tp,  then  some  posterior  information  may 
be  unavailable.  Revised  operational  procedures 
might  involve  interviewing  the  household 
through  tf . 

One  of  the  important  discriminants  between 
the  weighting  procedures  is  how  successfully 
they  avoid  the  need  for  data  from  the  period 
that  the  longitudinal  household  exists  but  is 
not  in  sample.  (The  need  for  such  data  is 
avoided  by  assigning  zero  weights  to  these 
problem  households.)  In  terms  of  information 
needed  for  weighting,  some  procedures  require 
only  enough  data  to  determine  whether  tg<t}, 
while  others  need  to  know  tg  even  when  it  is 
less  than  t\ .  Similarly,  some  procedures  only 
require  knowledge  of  whether  t2<t£,  while 
others  need  to  know  tf  even  when  it  is  greater 
than  t2.  Furthermore,  besides  this  need  for 
information  for  determination  of  weights,  if 
any  parameters  other  than  the  number  of  longi- 
tudinal households  are  to  be  estimated,  then 
required  subject-matter  data  may  be  missing  as 
well,  either  before  t^ ,  after  t2 ,  or  both. 

While  the  problem  of  missing  information  is 
a  serious  one,  it  is  not  fatal.  Procedures  can 
be  developed  to  compensate  for  the  unavailable 
data.  Specifically,  the  data  collected  on 
these  households  while  they  were  in  sample 
should  be  sufficient  for  performing  imputation 
for  existence/nonexistence  outside  the  in- 
sample  period  and  formation  and/or  dissolution 
dates.  The  imputed  values  can  then  be  used  to 
calculate  weights  for  these  households.  These 
households  can  then  be  treated  as  noninterviews 
so  that  the  weights  of  mover  households  with 
similar  demographic  characteristics  but  with 
complete  data  receive  increased  weights  while 
the  deficient  households  themselves  receives 
zero  weights. 

If  the  models  underlying  the  procedures 
developed  for  adjusting  for  the  missing  infor- 
mation are  true  then  it  is  still  possible  to 
obtain  unbiased  estimators,  although  now  in  a 
model-based  sense.  Furthermore,  since  the 
missing  information  that  we  are  concerned  with 
here  is  not  caused  by  refusal  to  respond, 
modeling  in  this  context  might  not  suffer  from 
the  usually  imperfect  assumptions  on  similarity 
between  respondents  and  nonrespondents  that 
underlie  any  adjustments  that  use  data  from 
respondents  to  account  for  data  missing  from 
refusals.  In  addition,  because  of  the  longi- 
tudinal nature  of  the  survey,  there  is  gener- 


ally a  large  amount  of  data  available  from  the 
problem  households  that  could  be  used  in  such 
adjustments.  However,  if  the  models  are  not 
perfect,  then  in  general,  the  larger  the  pro- 
portion of  data  required  that  is  unavailable, 
the  greater  the  potential  for  serious  bias 
problems. 

Variances.  In  general,  estimation  proce- 
dures  with  the  smallest  variances  are  those 
that  utilize  available  data  Intensively  and 
tailor  the  weights  to  the  specific  time  inter- 
val of  interest.  Unfortunately,  as  shall  be 
seen  in  the  next  section,  such  procedures  are 
often  characterized  by  heavy  needs  for  unavail- 
able data  which,  as  noted  above,  may  impact 
unfavorably  upon  bias.  Thus,  there  often  is  a 
direct  trade-off  between  variance  and  the  risk 
of  bias.  It  will  be  difficult  to  weigh  these 
factors  against  each  other,  since  it  appears 
that  no  single  procedure  will  provide  the 
correct  balance  for  all  of  the  multitude  of 


For  use  in  the  next  section,  we  will  define 
some  labels  for  the  advantages  and  disadvan- 
tages identified  in  the  foregoing  discussion. 
Let: 
Tj   mean  that  a  single  longitudinal  weight 

exists  for  each  household,  valid  for  all 
time  intervals  for  which  the  household  is 
in  the  universe,  and  which  depends  only 
on  conditions  which  could  be  determined 
during  the  first  interview, 
T2   mean  the  negation  of  Tj , 
BW}  mean  that  no  data  from  the  period  pre- 
ceeding  the  first  interview  are  unavail- 
able but  required  for  weighting, 
BW2  mean  that  we  need  to  know  for  weighting 
whether  the  longitudinal  household  existed 
before  the  first  interview, 
BW3  mean  that  we  need  to  know  for  weighting 
the  conception  date   of  the   household 
(within  the  time  interval  of  interest), 
BD}  mean  that  no  subject-matter  data  from  the 
period  preceeding  the  first  interview  are 
unavailable  but  required, 
B02  mean  the  negation  of  BD^ , 
FWi  mean  that  no  data  from  the  period  follow- 
ing the  last  interview  are  unavailable 
but  required  for  weighting, 
FW2  mean  that  we  need  to  know  for  weighting 
the  dissolution   date  of  the   household 
(within  the  time  interval  of  interest), 
FDi  mean  that  no  subject-matter  data  from  the 
period  following  the  last  interview  are 
unavailable  but  required, 
FD2  mean  the  negation  of  FD^ . 
Note  that  T^ ,  BWlf  BDj,  FWj  and  FDj  are  the 
desirable  properties. 

5.  DETAILED   COMPARISONS   OF   ADVANTAGES   AND 
DISADVANTAGES 

Table  1  below  presents  advantages  and  disad- 
vantages of  each  definition,  procedure  and  uni- 
verse combination.  A  comparison  of  these 
features  follows  the  table.  Next,  an  explana- 
tion of  each  entry  1n  the  table  is  given. 
Finally,  a  discussion  of  data  utilization, 
which  is   not  in   Table  1,   is   presented. 
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Oefmt-  Proce- 


I-  Un1-   IT!    T2   »!   BWj   »*3   Bt^    BD2   F^    FV2   FT)!    FD2 


NC 

All 

Both 

XX                           XXI 

SH 

HW 

SH.    RM 

HH 

SH,    RM 

RH 

SH,   RM 

BI 

SH.  RW 

hi 

SH,    RM 

SH,   RM 

w 

Both 

X                        X                  X                  X                  X 

Comparison  of  Features  in  Table  1.  As  noted  at 
the  end  of  Section  4,  Tj ,  BWlt  BDj ,  FWj,  and 
FD^  are  the  desirable  properties.  For  the  NC 
definition  all  five  procedures  considered  here 
possess  all  these  desirable  properties,  as  does 
the  HW  procedure  for  the  SH  definition. 

However,  for  the  SH  and  RM  definitions,  and 
most  other  definitions  too,  the  BH,  BI,  and  CM 
procedures  have  different  subsets  of  the  set 
of  desirable  features,  so  that  the  procedure  to 
be  adopted  depends,  at  least  in  part  on  the 
features  deemed  most  important.  AW  possesses 
none  of  these  desirable  features  for  these  two 
definitions.  Its  principal  advantage  lies 
in  possible  reductions  in  variances  because 
of  complete  utilization  of  available  data, 
which  will  be  discussed  later.  BH  has  advan- 
tages Tj,  BDi,  and  FWj  for  the  unrestricted 
universe,  and  Tj  and  BDj  for  the  restricted 
universe.  The  main  reason  for  consideration 
of  this  procedure  would  be  that  it  is  the 
only  one  among  BH,  BI  and  CM  that  always  has 
advantage  J\  .  BI  has  advantages  BD^  and  FWj 
for  the  unrestricted  universe  and  BW}  and  BDj 
for  the  restricted  universe.  Its  principal 
advantage  over  BH  is  that  for  the  restricted 
universe  no  retrospective  questions  need  be 
asked.  CM  (which  is  only  applicable  to  the 
restricted  universe)  possesses  all  desirable 
features  except  Tj ,  that  is  no  information  not 
currently  collected  is  needed  for  this  proce- 
dure. Recall,  however,  that  CM  had  the  disad- 
vantage of  being  slightly  biased  as  explained 
in  Section  3. 

Explanation  of  Entries  in  Table  1.  All 
explanations  presented  below  apply  to  both 
universes  unless  otherwise  stated. 

NC  Definition,  All  Procedures.   Since 
the  composition  57  a  household  is  unchanged 
throughout  its  period  of  existence  under  NC,  we 
have  the  following  two  possibilities: 

(a)  No  original  sample  people  were  in  the 
household  at  any  time  during  its  period 
of  existence,  in  which  case  the  longi- 
tudinal household  weight  1s  zero  for  any 
time  interval  and  procedure. 

(b)  One  or  more  original  sample  people  were 
in  the  household  throughout  Its  exis- 
tence, in  which  case  the  beginning  and 
ending  dates  of  the  household  are  known, 
as  is  the  composition  of  the  household 
and  complete  data  for  each  month  of  its 
existence.  Consequently,  features  BW} , 
BO! ,  FWi ,  and  FOj  apply. 

Furthermore,  1\  applies  since  procedures  BH, 
BI,  CM,  and  AW  all  reduce  to  the  cross-section- 
al household  weight  at  the  beginning  date  of 
the  household,  while  HW  is  the  weight  of  the 
householder  at  the  beginning  date. 

SH  Definition,  HW  Procedure.  The  explana- 
tion  is  similar  to  the  one  given  above,  except 


now  the  two  cases  are:  (a)  The  householder  was 
not  an  original  sample  person,  (b)  The  house- 
holder was  an  original  sample  person. 

SH  and  RM  Definitions,  BH  Procedure.  Tj 
is  applicable,  since  by  definition  the 
weight  Is  the  cross-sectional  household  weight 
as  of  the  beginning  date  of  the  household.  BW;? 
applies  because  the  longitudinal  household 
weight  is  the  cross-sectional  household  weight 
as  of  the  first  month  1n  sample  if  the  house- 
hold began  that  month,  while  otherwise  the 
weight  will  be  zero  since  there  were  no  orig- 
inal sample  people  1n  the  household  when  it 
began.  (For  the  restricted  universe,  house- 
holds which  entered  sample  after  the  beginning 
of  the  time  Interval  always  receive  a  zero 
weight.) 

BDj  holds  since  all  households  with  positive 
weights  were  in  sample  at  their  beginning  date 
and  no  retrospective  subject-matter  data  is 
therefore  needed. 

FWj_  holds  for  the  unrestricted  universe 
since  the  weight  is  determined  at  the  beginning 
date  of  the  household.  However,  for  the 
restricted  universe,  it  is  necessary  to  know  if 
the  household  continued  to  exist  throughout 
the  entire  time  interval  because  it  receives  a 
zero  weight  for  the  time  interval  if  it  did  not 
continue.  Under  current  procedures  a  household 
which  no  longer  has  any  original  sample  person 
1s  not  followed,  and  it  would  therefore  gener- 
ally not  be  possible  to  determine  if  it  re- 
mained in  existence  for  the  entire  time  inter- 
val. Consequently,  FW2  applies. 

FD2  applies  since  there  would  be  missing 
data  for  all  households  with  positive  weights 
which  continued  to  exist  after  there  were  no 
longer  any  original  sample  people  present, 
which  could  happen  for  either  of  these  defini- 
tions. 

SH  and  RM  Definitions,  BI  Procedure.  T2 
is  applicable  since  time  intervals  with  differ- 
ent  beginning  dates  may  yield  different  longi- 
tudinal weights.  BW^  applies  for  the  re- 
stricted universe,  since  the  longitudinal 
weight  is  the  cross-sectional  household  weight 
as  of  the  first  month  of  the  time  interval  for 
all  households  in  sample  that  month,  and  zero 
for  all  other  households.  However,  BW2  applies 
for  the  unrestricted  universe  since  longi- 
tudinal households  that  entered  sample  after 
the  beginning  of  the  time  interval  are  treated 
as  in  the  BH  procedure. 

B0j  holds  since  any  household  with  a  posi- 
tive weight  was  either  in  sample  the  first 
month  of  the  time  interval  or  the  month  that 
the  household  began,  and  consequently,  no 
retrospective  data  are  needed. 

As  in  the  BH  procedure,  and  for  the  same 
reasons,  FWj  applies  for  the  unrestricted 
universe,  FW2  for  the  restricted  universe  and 
FD2  for  both  universes. 

SH  and  RM  Definitions,  CM  Procedure,  Re- 
stricted Universe.  T2  Is  applicable  since  any 
two  Intervals  may  yield  different  longitudinal 
weights. 

Furthermore,  BW! ,  BOi  ,  FWj ,  and  FDj  apply. 
The  explanation  1s  similar  to  that  given  for 
the  NC  definition  except  now  the  two  cases  are: 
(a)  No  original  sample  people  were  household 
members  for  the  entire  time  interval,  (b)  At 
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least  one  original  sample  person  was  a  house- 
hold member  for  the  entire  time  interval. 

SH  and  RM  Definitions,  AW  Procedure.  For  an 
explanation  for  this  row  of  the  table  see  the 
original  paper. 

Utilization  of  Data.  Having  compared  the 
procedures  with  respect  to  needs  for  unavail- 
able data  and  the  multiplicity  of  weights,  we 
now  turn  our  attention  to  variance.  To  compare 
the  variance  characteristics  of  the  procedures 
we  will  focus  on  the  amount  of  collected  data 
that  is  used  in  obtaining  estimates,  since  this 
is  a  primary  determinant  of  variance.  This 
discussion  will  also  better  illustrate  the 
proportion  of  data  needed  for  estimation  that 
is  unavailable  for  each  procedure.  In  general, 
the  greater  this  proportion  is,  the  larger  the 
burden  is  on  any  missing  data  procedure  em- 
ployed, with  a  resulting  greater  potential  for 
bias  problems.  To  make  the  comparison  we  show 
in  Table  2,  all  24  possible  cases  of  how  the 
data  on  a  longitudinal  household  may  be  com- 
plete, partly  available,  or  nonexistent  for 
a  particular  time  interval. 

The  symbols  tg,  t^ ,  t2»  and  t£  denote  begin- 
ning date  of  household,  first  sample  month, 
last  sample  month,  and  ending  date  of  household 
respectively.  The  columns  indicate  different 
time  intervals.  Interval  B  is  the  interval  of 
interest.  Interval  A  is  from  tg  until  the 
beginning  of  interval  B,  while  interval  C  is 
from  the  end  of  interval  B  until  tf.  The 
fifth  case,  for  example,  is  of  a  household  that 
formed  before  interval  B  about  which  we  are 
missing  some  data  pertinent  to  the  early  part 
of  interval  B.  The  first  nine  cases  comprise 
the  restricted  universe.  The  last  15  cases 
fill  out  the  unrestricted  universe.  Each  case 
is  marked  as  having  complete  data,  partial  data, 
or  no  data.  Of  course,  all  of  this  is  assuming 
perfect  response.  The  only  type  of  missingness 
that  we  are  discussing  here  is  that  caused  by 
operational  procedures.  On  the  right  there  is 
a  column  for  each  procedure  with  an  "A"  entered 
if  it  always  uses  the  case,  an  "S"  if  it  some- 
times uses  the  case  but  not  always  (which  will 
be  explained  in  the  discussion  that  follows), 
and  a  blank  otherwise.  These  comparisons  do 
not  apply  to  the  NC  definition,  for  which  all 
five  procedures  use  all  the  complete  cases 
and  no  other  cases. 


The  BH  procedure  uses  the  complete  cases  1, 
10,  12,  and  13,  but  does  not  use  the  complete 
cases  2  and  11.  It  also  uses  the  partial  cases 
3,  14,  16,  and  17,  and  cases  7  and  22  for  which 
there  1s  no  data  in  Interval  B.  The  BI  proce- 
dure uses  all  the  complete  cases,  more  of  the 
partial  cases  and  none  of  the  cases  with  no 
data.  We  thus  think  the  BI  procedure  will  tend 
to  produce  smaller  variances  than  the  BH  proce- 
dure since  it  uses  more  of  the  available  data. 
However,  it  is  not  clear  in  general  which  of 
these  two  procedures  has  the  smaller  proportion 
of  needed  data  that  is  missing. 

The  CM  procedure  is  appealing  for  the  re- 
stricted universe  since  it  uses  all  the  complete 
cases  (except  in  the  rare  situation  when  there 
is  at  least  one  original  sample  person  present 
for  every  month  of  interval  B,  but  none  of  them 
are  present  for  the  entire  interval),  and  none 
of  the  other  cases.  It  should  thus  have  fairly 
small  variances  and  has  only  the  slight  bias  in- 
dicated in  Section  3.  However,  it  is  not  appli- 
cable to  the  unrestricted  universe. 

The  HW  procedure  uses  the  same  complete  cases 
as  the  BH  procedure,  except  it  does  not  use 
these  cases  when  the  householder  is  not  an 
original  sample  person,  and  it  uses  none  of  the 
other  cases.  However,  it  is  not  applicable  to 
the  RM,  and  most  other  longitudinal  household 
definitions. 

The  AW  procedure  is  the  most  aggressive  in 
utilizing  partial  data.  It  uses  all  the  com- 
plete and  partial  cases  while  avoiding  the 
cases  with  no  data.  Also  note  that  it  assigns 
smaller  weights,  in  general,  to  the  partial 
cases  than  the  complete  cases.  We  believe  it 
will  tend  to  produce  the  smallest  variances 
for  most  definitions,  particularly  in  the  unre- 
stricted universe,  but  also  tends  to  have  the 
highest  proportion  of  data  that  is  needed  for 
estimation  but  unavailable. 
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INTRODUCTION 

Missing  data  in  sample  surveys  are  of  two  general 
forms.  Unit  nonresponse  occurs  when  no  information 
is  available  to  the  survey  for  an  entire  sample  unit, 
such  as  a  person,  or  household,  or  hospital.  Some 
information  may,  however,  be  available  from  other 
kinds  of  records  such  as  those  used  to  define  the 
sample  frame.  The  reasons  for  unit  nonresponse  vary; 
for  example,  a  person  may  refuse  to  respond,  be  away 
from  home,  or  be  impossible  to  locate.  Typically,  this 
form  of  nonresponse  is  handled  in  part  by  a  call-back 
strategy.  That  is,  the  interviewer  makes  repeated 
attempts  to  contact  the  unit.  If  the  call-back  strategy 
fails,  or  is  not  feasible,  weights  can  be  assigned  to  the 
responding  units  (Cochran,  1977). 

The  other  type  of  nonresponse  is  item 
nonresponse.  It  occurs  when  the  unit  supplies 
information  for  some  but  not  all  of  the  variables.  For 
example,  a  person  may  answer  questions  about  age, 
race,  and  sex  but  not  about  income;  or  the  information 
may  be  deleted  by  an  edit  failure.  Depending  upon  the 
intended  uses  of  the  data,  item  nonresponse  can  be 
handled  with  two  different  but  overlapping 
approaches.  Either  the  data  can  be  completed  using 
imputation  methods,  or  the  recorded  data  can  be  used 
with  modified  estimation  methods.  The  modified 
estimation  methods  may  also  be  used  to  impute  the 
missing  data. 

The  focus  of  this  paper  is  the  imputation  of 
categorical  data  in  a  longitudinal  survey.  Statistical 
research  pertaining  to  missing  categorical  data  has 
considered  censored,  discrete  random  variables  and 
partially  or  completely  unobserved  data  in  contingency 
tables.  Harley's  (1958)  solution  to  the  problem  of 
estimating  the  rate  parameter  for  a  censored  Poisson 
random  variable  is  a  special  case  of  what  was  later 
called  the  EM  algorithm.  Fuchs  (1982)  applied  the  EM 
algorithm  to  find  maximum  likelihood  estimates  for 
parameters  in  a  log-linear  model,  when  the  values  of 
one  or  more  variables  are  missing  for  subsets  of  the 
cross-classified  data.  Chen  and  Fienberg  (1974) 
developed  models  for  analyzing  contingency  tables 
with  supplemental  marginal  totals. 

Unfortunately,  none  of  these  methods  offer 
solutions  to  the  problem  of  missing  categorical  data  in 
complex,  longitudinal  surveys  such  as  the  Survey  of 
Income  and  Program  Participation  (S1PP).  Although  a 
contingency  table  could  be  constructed  from  monthly 
responses  to  a  categorical  survey  item  over  a  year,  the 
resulting  twelve  dimensional  table  would  be 
exceedingly  sparse.  In  addition,  the  application  of  log- 
linear  models  or  the  EM  algorithm  to  such  tables  would 
be  computationally  difficult. 

In  this  paper  we  describe  a  general  method  for 
imputing  missing  categorical  items  in  longitudinal 
surveys.  We  show  that  the  longitudinal  data, 
completed  according  to  the  method,  provides  unbiased 
estimates  of  the  probability  of  occurrence  of  the 
various  response  patterns,  assuming  that  the  data  are 
observed  at  random  and  missing  at  random  (Rubin, 
1976).  The  importance  of  longitudinal  information  for 
imputing  missing  data  is  discussed,  and  a  statistic 
measuring  the  amount  of  information  available  is 
described. 

The  imputation  methodology  described  here  was 


developed  from  data  collected  by  the  Income  Survey 
Development  Program  (ISDP).  The  method  is 
suggested  as  the  fundamental  tool  for  imputing 
missing,  longitudinal,  categorical  items  in  the  Survey 
of  Income  and  Program  Participation  (SIPP).  However, 
its  implementation  can  occur  only  after  further 
development  and  modifications.  Here,  it  is  described 
as  a  general,  statistical  approach  applicable  to  any 
longitudinal  survey.  The  data  from  the  ISDP  is  utilized 
only  to  explain  the  method  and  provide  examples. 

The  Income  Survey  Development  Program  (ISDP) 
was  initiated  to  gain  experience  with  the  data 
collection  and  data  analysis  requirements  of  SIPP.  The 
ISDP  is  a  longitudinal  survey  consisting  of  two  national 
panels  (1978,  1979).  The  sample  design  is  a  multi- 
stage stratified  sample  of  the  United  States 
population.  Sampling  elements  are  housing  units  not 
households  (which  may  move)  or  persons.  The  first 
sampling  stage  involves  the  definition  of  the  United 
States  in  terms  of  counties  or  groups  of  counties  called 
primary  sampling  units  (PSU"s),  which  are  stratified. 
At  the  second  stage,  a  sample  of  addresses  within  the 
PSU^  is  selected.  To  minimize  the  inconvenience  to 
sample  participants,  interviews  are  conducted  every 
three  months.  Each  household  is  assigned  to  one  of 
three  rotation  groups  (A,B,C).  Every  three  months  all 
the  households  in  a  rotation  group  are  interviewed  and 
data  is  collected  for  each  of  the  previous  three 
months.  A  wave  is  the  time  period  during  which  each 
rotation  group  is  interviewed  once.  Data  from  each 
wave  is  published  by  the  United  States  Bureau  of  the 
Census  as  a  cross-sectional  file.  The  longitudinal  data 
for  our  imputation  research  is  an  annual  file, 
constructed  by  merging  five  waves  of  ISDP  data  from 
the  1979  paneL 

THE  IMPUTATION  OF  MISSING  LONGTrUTHNAL 
CATEGORICAL  SURVEY  rTEMS 

Many  of  our  activities  today  are  the  direct  result 
of  events  which  occurred  yesterday.  Last  night  we 
may  have  arrived  home  late,  returning  from  a  long 
trip.  Today,  it  is  likely  that  we  will  need  to  stop  off  at 
the  gas  station  to  refill  our  car's  fuel  tank.  Or  perhaps 
yesterday  we  were  layed-off  from  our  job.  Today  we 
are  reading  the  employment  opportunities  section  of 
the  newspaper. 

Analogously,  in  the  ISDP,  there  are  strong 
dependencies  between  the  monthly  values  of  the 
survey  items.  For  example,  fitting  a  logistic 
regression  of  the  receipt  of  wages  and  salaries  in  July 
on  the  receipt  reported  in  other  months,  we  found  the 
parameters  for  the  months  June,  August,  and 
November  to  be  significantly  different  from  zero. 
Similiar  results  where  obtained  in  regressions  of  each 
month  on  the  remaining  months. 

Define  a  longitudinal  record  for  a  survey  unit  to  be 
the  set  of  responses  recorded  over  a  fixed  time 
period.  In  the  ISDP  as  well  as  SIPP,  the  survey  unit  is 
a  household,  but  other  examples  of  survey  units  include 
the  person,  family,  and  employer.  In  this  paper,  the 
survey  person  is  the  unit  of  analysis.  The  set  of 
responses  on  the  longitudinal  record  may  be  any 
combination  of  survey  items.  Here,  we  restrict 
ourselves  to  a  single  item  recorded  monthly  for  one 
year.  For  example,  the  receipt  of  wages  and  salaries. 


Ill 


The  following  example  Illustrates  the  imputation 
process.  Consider  the  ISDP  survey  item  indicating 
whether  a  person  had  a  job  or  business  during  a 
month.  Further,  consider  the  set  of  individuals  in 
rotation  group  A  who  responded  "yes"  from  January 
thru  November  1979,  but  did  not  respond  in  December, 
1979.  The  longitudinal  record  for  these  individuals  is 
given  by 


X  =  (0,0,0,0,0,0,0,0,0,0,0,2), 


fth 


where  Xt  =  0  (t=l,...,12),  if  the  response  in  the  t1 
month  is  "yes",  X*  *  1  if  the  response  is  "no",  and 
X*  =  2  indicates  missing  data.  Either  "0"  or  "1"  is  an 
admissible  imputation  value  for  X12.  Based  on  the 
individuals  in  rotation  group  A  who  reported  data  in 
every  month  from  January  to  December  we  estimate 


Prob  (X12  =  0  I  Xj  =  0,  X2  =  0,...,Xn 
=  IffJ    =  0.9723,  and 


=  0) 


Prob  (X12  =  1  I  X1  =  0,  X2  =  0,...,Xn  =  0) 

=   1-.9723  =  0.0277. 

Generating  a  random  number  between  zero  and  one, 
we  impute  X12  =  0  if  the  random  number  is  less  than 
or  equal  to  0.9723,  otherwise  we  impute  X12  =  1. 

This  imputation  procedure  can  be  applied  to  any 
categorical  survey  item  with  any  combination  of 
missing  months.  Consider  the  sample  item  indicating 
the  monthly  receipt  of  wages  and  salaries  and  the 
following  longitudinal  record  for  persons  in  rotation 
group  A 

X  ■  (0,0,0,0,0,0,0,2,2,2,0,0) . 

Based  on  those  persons  responding  in  all  twelve 
months,  we  estimate 

Prob  (X8  =  x8,  X9  =  x9,  X10  =  x10  I  (1) 

Xj  =  0,~.,X7  =  0,  Xn  =  0,  X12  =  0) 

=  mu  =  °'9823  if  V  °»  V  ••  X10=  °» 
=  1115   =   °-0088    if  V   *•   V   °»   X10=   °» 

=  rno  =  °-0035  If  V  °»  V  ••  xio=  V» 

=  TT40  =  °-0026  lf  V  x»  V  »•  xio=  °» 

=  mo  -  °-0018  ,f  V  ••  V  x»  xio"  •• 
=  mo  *  °-0009  lf  V  *•  V  ••  xioc  »• 

Here,  we  impute  the  entire  subvector  (x«,  xe,  x,p) 
based    on    a    random    draw    from    a    uniform    (0,1) 


distribution. 

The  imputation  process  is  formalized  by  letting 
the  random  variable  X  represent  the  responses  (and 
missing  data)  on  a  longitudinal  record.  The  vector  X  = 
x  can  be  partitioned  into  subvectors  x._  and  x,., 
representing  the  missing  and  recorded  monthly  valves, 
respectively.  On  the  rn  longitudinal  record,  we 
impute  the  missing  items  X_j  based  on  the  reported 
values  x_j.  The  imputed  values  are  a  random  draw 
from  the  conditional  distribution  f(xm  |Xr=xri), 
emperically  estimated  from  the  longitudinal  records 
with  values  reported  in  every  month. 

AN  UNBIASED  ESTIMATE  OF  THE  OCCURRENCE 
PROBABILITY  OF  A  LONGITUDINAL  PATTERN 

Response  patterns  to  survey  items  are  singularly 
important  in  longitudinal  surveys.  The  longitudinal 
data  is  collected  so  that  changes  over  time  of  the 
survey  items  can  be  analyzed.  For  example,  a 
researcher  may  wish  to  accurately  estimate  the 
average  duration  of  unemployment  or  the  length  of 
time  an  individual  participates  in  a  social  welfare 
program.  It  is  important  that  the  imputations  do  not 
disrupt  the  frequency  distribution  of  response  patterns 
and  bias  these  longitudinal  estimates. 

Consider  a  simple  random  sample  of  a  size  n 
without  nonresponse.  The  longitudinal  records  for 
individuals  in  the  labor  force  every  month  are 
represented  by 


X  =  (0,0,0,0,0,0,0,0,0,0,0,0) 


(2) 


Let   the  binomial  random   variable  T  represent  the 
number  of  times  the  pattern  (2)  occurs.  It  follows  that 

J  T  (Xj  =  0,  X2  =  0,...,X12  =  0)  (3) 


is  an  unbiased  estimate  of 


Prob  (Xx  =  1,  X2  =  1,...,X12  =  1). 

Of  course,  in  longitudinal  surveys  with  complex  sample 
designs  like  SIPP,  the  statistic  (3)  would  need  to  be 
modified  to  reflect  the  particular  survey  design. 

In  longitudinal  files,  completed  according  to  the 
imputation  method  described  above,  statistics 
analagous  to  (3)  are  also  unbiased  estimates  of  the 
probability  that  the  particular  pattern  occurs;  provided 
the  data  are  missing  at  random  and  observed  at 
random  (Rubin,  1976).  We  prove  this  result  for 
longitudinal  records  containing  two  time  periods. 
Without  loss  of  generality  the  result  extends  to 
longitudinal  records  of  any  length. 

THEOREM 

Consider  the  longitudinal  record  (Xj  =  a,  X,  =  b), 
where  a  and  b  represent  the  only  values  of  the 
categorical  random  variables  Xj  and  X«.  In  a  simple 
random  sample  of  size  n,  completed  by  imputation,  let 
the  binomial  random  variable  ViX.^  =  a,  X«  =  b) 
represent  the  number  of  occurrences  of  the 
longitudinal  record.  Assuming  the  data  are  observed 
at  random  and  missing  at  random. 


!T(x 


1  *  a,  x2  =  b) 


Is  an  unbiased  estimate  of 
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Prob  (Xj  =  a,  X2  =  b) 

Proof: 

The  pattern  (X1  =  a,  X2  =  b)  can  arise  in  the 
imputed  sample  in  four  ways: 

1)  (Xj  =  a,  X2  =  b)  is  reported, 

2)  Xj  =  a  is  imputed  given  X2  =  b  is  reported 

3)  X2  =  b  is  imputed  given  X,  =  a  is  reported, 

4)  (Xj  =  a,  X2  =  b)  is  imputed. 

Define  the  binomial  random  variable  T(  )  as  the 
number  of  occurrences  of  the  event  in  parentheses. 
For  example,  using  an  astrisk  to  indicate  imputed 
counts, 

T*(Xj  =  a  I  X2  =  b) 

represents  the  number  of  times  that  Xj  =  a  is  imputed 
given  that  X«  =  b  is  reported. 

The  total  number  of  times  the  pattern 

(Xj  =  a,  X2  =  b) 

occurs  in  the  sample,  completed  by  imputation,  can  be 
decomposed  into  terms  corresponding  to  the  four  ways 
the  pattern  (a,  b)  arises, 

TKXjsa,  X2  =  b)  =  TtXj  =  a,  X2  =  b)  +      (4) 

T*(XX  =  a  I  X2  =  b)  +  T*(X2  =  b  I  Xx  =  a)  + 

T*(XX  =  a,  X2  =  b). 

Let  the  indicator  vector  Y  =  (Yj,  Y2)  represent 
the  reporting  status  of  the  elements  in  the  longitudinal 
record.  That  is, 

Yj  =  1  if  Xj  is  reported  0=1,2), 
=  0  otherwise  . 

The  expected  value  of  the  sum  (4)  with  respect  to  the 
data  reported  in  the  sample  is 

EtTKXj  =  a,  X2  =  b)  I  TCXj  =  a,  X2  =  b) )  =       (5) 

TtXj  =  a,  X2  =  b,  Yt  =  1,  Y2  =  1)  + 

T(X2  =  b,  Yx  =  0,Y2  =  1)  • 

[tCX^   a,   X2=   b,   Yt=   1,   Y2=    1*)| 
[      T(X2=   b,    Yj=    1,    Y2=    1)        H  + 

70^  =  a,  Yx  =  1,  Y2  =  0)  • 

E(Xj=  a,   X2=  b,   Yj=   1,  Y2=   1)1 
t(x1=  a,  v1=  i,  y2=  i) — y 

+  T(Y1«0,Y2  =  0)' 

[tCXjX  a,  X2=  b,  Yj=   1,  Y2=   ll 

[ — T(v1=  i,  v2=  i) J 

Note  that  the  random  variables  in  the  conditional 


expectation  (5)  are  multimoniaL  The  expectation  with 
respect  to  all  possible  samples  is  found  by  applying  the 
following  result. 


Let  (Xj,...,X|.)  be  multimonial  (n;Plf...,Pk)  random 
variables,  then  Xj  and  X3  are  independent  given 
X,  +  X,  =  J-  and 


The  expectation  of  (5)  with  respect  to  all  possible 
aamples  follows  from  the  lemma.  In  addition,  the 
assumption  that  the  data  are  observed  at  random  and 
missing  at  random  asserts  the  independence  of  the 
indicator  random  vector  Y  and  the  random  variables  in 
the  longitudinal  record. 

ECTKXj  =  a,  X2  =  b))  =    - 
E2  Ex  (TtXj  =  a,  X2  =  b)  |  HX1  =  a,  X2  =  b))  = 
n  Prob  (Xj  =  a,  X2  =  b)  Prob  (Yx  =  1,  Y2  =  1)  + 
n  Prob(X2  =  b)  ProbCYj  =  0,  Y2  =  1)  • 

EProb(Xj   =  a,   X2=   b)  1 

>rob(Xj=  a,   X2=   bJ+ProbCX^  b,   X2=   b)\  + 

n  ProWXi  =  a)  ProbtYj  =  1,  Y2  =  0)  • 

[Prob(Xj=  a,   X2=  b)  "1 

Prob(Xj=  a,   X2=  a)+Prob(Xj=  a,   X2=   b)J  + 

n  ProMYj  =  0,  Y2  =  0)  ProWXj  =  a,  X2  =  b) 
=  n  ProWXj  =  a,  X2  =  b) 


The  theorem  is  extended  to  longitudinal  records  of 
any  length  by  adding  the  appropriate  terms  to 
equation  (3). 

TOE       EXPECTED       NUMBER      OF       INCORRECT 
IMPUTATIONS 

Longitudinal  data  by  itself  may  not  always  be 
sufficient  to  accurately  impute  missing  data.  The 
amount  of  information  available  longitudinally  can  be 
measured  by  estimating  the  expected  number  of 
incorrect  imputations.  Consider  the  longitudinal 
record  for  the  monthly  receipt  of  wages  and  salaries, 

(6) 


X  =  (0,0,0,0,0,0,0,0,0,0,0,2), 


where  Xt  =  0  indicates  receipt  and  Xt  =  2  (t=l,..^12) 
indicates  missing  data.  The  probability 

Prob<X12  =  1  |  X1  =  0,.^Xn  =  0) 

is  estimated  from  the  completely  reported  cases  as 
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8/1236  =  0.0065.  This  probability  is  independent  of  but 
equal  to  the  probability  of  imputing  X12  =  1. 
Consequently,  the  probability  that  X12  =  1  is  imputed 
and  is  correct  is  (0.0065)  .  Similiarly,  the  probability 
that  X12  =  0  is  imputed  and  is  correct  is  (0.9935)2.  It 
follows  that  the  estimated  probability  of  an  incorrect 
imputation  for  the  longitudinal  record  (6)  is 

1  -  (0.0065)2  +  (0.9935)2  =  0.0013. 

Since,  there  are  seventeen  individuals  in  the  file  with 
this  longitudinal  record,  it  follows  that  the  estimated 
number  of  incorrect  imputations  is  17(0.013)  =  0.22. 

The  need  to  include  demographic  information 
would  be  indicated  by  an  estimated  number  of 
incorrect  imputations  greater  than  some 
predetermined  value.  Consider  again  the  longitudinal 
record  for  the  monthly  receipt  of  wages  and  salaries, 

X   =  (0,0,0,0,0,0,0,2,2,2,0,0). 

Thirty-eight  persons  in  rotation  group  A  had  this 
pattern.  Using  the  probabilities  given  in  (1),  the 
estimated  expected  number  of  incorrect  imputation  is 

38U-0.98252  -  0.00882  -  0.00352  -  0.00262  -  0.001 82  - 

0.00092)  =  1.25. 

Here,  we  want  to  use  demographic  information  to 
choose  the  most  appropriate  donor  pattern.  One 
approach  is  to  include  associated  survey  items  as 
elements  in  the  longitudinal  record.  For  example,  to 
impute  the  monthly  receipt  of  wages  and  salaries,  we 
can  include  in  the  longitudinal  records  survey  items 
indicating  seasonal  or  part  time  workers.  A  logistic 
model  may  also  be  useful,  especially  when  the  data  are 
sparse.  Letting  the  polychotomous  variable  Y 
represent  the  available  donor  patterns,  we  can  regress 
Y  on  concomitant  data,  represented  by  the  vector  X. 
Based  on  the  concomitant  information,  the  probability 
of  pattern  h  for  the  itn  longitudinal  record  is 


The  pattern  selected  for  imputation  can  be  the  one 
with  the  highest  probability,  or  the  decision  can  be 
based  on  a  random  number  generated  between  zero  and 


CODING  PATTERNS 

The  responses  on  any  longitudinal  record  can  be 
summarized  as  a  single  number.  Consider  the 
longitudinal  record 

X  *  (0,0,0,0,0,0,1,1,1,2,2,2), 

representing  the  receipt  of  wages  and  salaries  from 
January  (X,)  to  December  (X12).  This  pattern  can  be 
represented  in  base  ten  as 

377  =  (2x3°)  +  (2X31)  +  (2x32)  +  33  +  34  +  35  . 

In  general,  any  pattern  in  an  annual  file  of  monthly 
categorical  data  can  be  represented  by  the  polynomial 


P£   -      I      Ck  B  ■ 

1        k=l      k 

Each  pattern  has  a  unique  base  ten  representation, 
because  the  transformation  is  one-to-one  and  onto,  the 
index  k  represents  the  months  in  the  longitudinal  file 
in  reverse  order.  That  is,  k=l  represents  December, 
k=2  represents  November,  and  so  on.  The  coefficients 
Cj-  represent  the  monthly  values  of  the  item.  The 
letter  B  represents  the  appropriate  base.  Typically, 
the  base  is  one  more  that  the  highest  coefficent  (ck). 

Coding  the  longitudinal  record  patterns  as  base  ten 
numbers  operationally  simplifies  the  imputation 
process.  Consider  the  longitudinal  record 

X  =  (0,0,0,0,0,0,0,0,0,2,2,2), 

indicating  the  receipt  of  wages  and  salaries  in  each 
month  from  January  thru  December.  The  receipt  of 
wages  and  salaries  in  the  ttn  month  is  denoted  by 
Xt  =  0,  and  a  missing  monthly  item  is  denoted  by 
Xt  =  2.  This  pattern  is  represented  in  base  ten  by  the 
number  26.  Because  the  transformation  to  base  ten  is 
unique,  all  individuals  in  the  data  file  with  the  value  26 
for  their  pattern  have  reported  the  receipt  of  wages 
and  salaries  from  January  to  September,  but  did  not 
respond  to  the  item  from  October  to  December. 

Donor  patterns  from  the  cases,  reporting  values  in 
every  month,  are  identified  by  subtraction.  For 
example,  the  donor  pattern. 


(X10  =  0,X11=0,X 


*12 


=  0) 


is  identified  by  subtracting  the  base  three  number  222 
from  the  longitudinal  pattern 

000000000222 

-222 

006060000060 


The  equivalent  operation  could  also  be  done  in  base 
ten.  Noting  that  222  is  represented  in  base  ten  by  26, 
the  donor  pattern  (X9  =  0,  Xiq  =  0,  X^  =  0)  is  the  base 
three  representation  of  (26-26)  =  0.  Similiarly,  all 
possible  donor  patterns  i.e.,  000  thru  111  and  found  by 
subtracting  from  26  the  corresponding  base  ten 
numbers  26  through  0. 

APPLICATIONS  AND  EXTENSIONS  OF  THE  METHOD 

Limitations  on  the  number  of  pages  available  in 
these  proceedings  preclude  a  complete  discussion  of 
our  research  on  longitudinal  item  imputation.  A  more 
extensive  description,  especially  as  it  applies  to  the 
Survey  of  Income  and  Program  Participation,  can  be 
found  in  Samuhel  and  Huggins  (1984). 
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EARLY  INDICATIONS  OF  ITEM  NONRESPONSE  ON  THE  SURVEY  OF  INCOME  AND  PROGRAM  PARTICIPATION 
by  John  F.  Coder  and  Angela  M.  Feldman 


Introduction 

The  Survey  of  Income  and  Program  Participa- 
tion (SIPP)  promises  to  become  the  most  important 
source  of  data  for  measuring  the  level  of  and 
changes  in  the  economic  well-being  of  the  U.S. 
population.  Collection  of  these  data  began  in 
the  fall  of  1983.  The  survey  design  for  the 
initial  sample  of  25,900  housing  units  in  the 
noninstitutional  population,  calls  for  each 
household  to  be  interviewed  at  4-month  intervals 
over  a  2-1/2  year  period.  The  sample  is  divided 
into  4  rotations  or  panels  of  equal  size  and  one 
panel  is  interviewed  in  each  month  throughout 
this  period  resulting  in  a  total  of  eight  person- 
al contacts  by  Census  interviewers  for  each 
sample  household. 

The  first  interviews  in  this  new  survey  were 
conducted  during  October,  November,  and  December 
of  1983,  and  January  1984.  The  questionnaire 
used  to  collect  information  in  the  initial  inter- 
view concentrates  on  labor  force  participation 
and  sources  and  amounts  of  income.  Most  data  is 
recorded  separately  by  month  for  the  4-month 
reference  period  ending  in  the  month  prior  to  the 
month  of  interview.  For  example,  data  collected 
in  the  October  1983  interviews  covered  the  June 
through  September  period.  Most  interviews  were 
completed  during  the  first  2  weeks  of  the  inter- 
view month. 

The  primary  purpose  of  this  paper  is  to 
present  some  preliminary  indications  of  the  item 
nonresponse  rates  for  the  first  interviews  of 
SIPP.  These  rates  of  nonresponse  cover  labor 
force,  income  recipiency,  and  income  amounts. 
The  effect  of  self  or  proxy  respondents  on  nonre- 
sponse rates  is  discussed  for  a  selected  group  of 
items.  Some  data  on  other  aspects  of  the  survey 
have  also  been  included.  These  are  overall 
household  noninterview  rates,  average  times  re- 
quired for  interviews,  and  use  of  callback  proce- 
dures to  obtain  missing  information. 

Item  Nonresponse 

Item  nonresponse  is  defined  in  this  paper  to 
mean  a  missing  answer  to  a  specific  question  that 
should  have  been  answered.  Item  nonresponse  can 
result  for  many  reasons,  the  most  frequent  being 
lack  of  knowledge  by  the  respondent,  i.e.,  "Don't 
Knows,"  and  refusals  to  answer.  Nonresponse  can 
also  result  when  the  interviewer  fails  to  record 
a  response  in  the  correct  location  or  follows  an 
incorrect  path  within  the  questionnaire  design. 

Labor  Force  Items—Table  1  shows  preliminary  non- 
response  rates  for  items  2a,  2b,  4,  5a,  5b,  6a, 
6b,  6c,  7a,  7b,  and  8a  of  the  labor  force  and 
recipiency  section  on  the  first  interview  ques- 
tionnaire. The  questions  themselves  are  shown  in 
Figure  1. 

In  general,  the  nonresponse  rates  for  the 
labor  force  questions  were  low  (see  table  1). 
The  nonresponse  rate  on  item  2a,  incidence  of 
looking  for  work  or  on  layoff  for  persons  who  did 
not  work  at  all  during  the  reference  period 
(nonworkers)  was  only  0.4  percent.  About  6.7 
percent  of  fche  nonworkers  reporting  looking  or  on 
layoff  had  a  nonresponse  for  item  2b,  the  number 
of  weeks  spent  looking  or  on  layoff. 


The  comparable  nonresponse  rates  for  workers  were 
1.0  percent  for  incidence  of  looking  or  on  layoff 
(item  7a)  and  3.2  percent  for  item  7b,  the  number 
of  weeks  spent  looking  or  on  layoff.  The  nonre- 
sponse rate  for  item  4,  asking  if  the  respondent 
held  a  job  or  business  during  the  entire  4-month 
reference  period,  was  less  than  0.1  percent. 

One  of  the  questions  with  a  relatively  high 
nonresponse  rate  in  the  labor  force  section  was 
item  5b  covering  the  number  of  weeks  absent  with- 
out pay  for  persons  having  a  job  for  the  entire 
period.  The  nonresponse  rate  for  this  question 
was  11.6  percent. 

Item  8a  is  the  question  covering  the  number 
of  hours  usually  worked  per  week  during  the 
4-month  period.  This  critical  data  item  was 
missing  for  1.3  percent  of  the  25,510  sample 
persons  reporting  a  job  or  business  during  the 
reference  period. 

Income  Recipiency. — A  major  portion  of  the  ques- 
tionnaire was  designed  to  determine  the  sources 
of  income  received  during  the  4-month  period  by 
each  household  member  age  15  years  old  and  over. 
A  total  of  52  different  income  sources  (other 
than  earnings  from  employment)  were  covered  in 
the  survey.  Tables  2  and  3  show  income  recipien- 
cy nonresponse  rates  and  ratios  of  nonresponses 
to  "YES"  responses  for  SIPP  and  the  March  1983 
CPS  for  a  selected  group  of  income  types.  The 
rates  refer  to  the  4-month  reference  period  for 
SIPP  and  calendar  year  1982  for  the  March  CPS. 

The  nonresponse  rates  for  SIPP  are  extremely 
low  and  vary  only  slightly  by  rotation.  The  non- 
response  rate  on  recipiency  for  SIPP  ranged  from 
less  than  0.1  for  Aid  to  Families  with  Dependent 
Children  and  private  pensions  to  1.3  percent  for 
stocks  or  mutual  funds.  In  contrast,  the  rates 
for  the  March  1983  CPS  clustered  around  the 
10-percent  level.  These  rates  for  the  March  CPS 
are  largely  attributable  to  the  7  percent  house- 
hold noninterview  rate  on  the  income  supplement 
questionnaire. 

The  last  two  columns  of  table  3  show  the 
ratios  of  nonresponses  to  "YES"  responses  for 
SIPP  and  the  March  CPS.  This  measure  of  non- 
response  may  be  better  than  the  overall  nonre- 
sponse rate  because  it  provides  a  measure  that  is 
relative  to  the  size  of  the  recipient  universe. 
The  March  CPS  ratios  are  again  much  higher  than 
those  encountered  in  the  first  interview  of  SIPP. 
This  difference  is  also  related  to  the  7  percent 
March  supplement  noninterview  rate.  Given  this 
fixed  nonresponse  rate  the  ratio  is  inversely 
related  to  the  proportion  of  the  population  re- 
ceiving a  specific  income  type.  This  is  evident 
by  the  large  ratio  of  4.01  for  Aid  to  Families 
with  Dependent  Children.  The  ratio  itself  means 
that,  in  this  case,  the  number  of  nonresponses 
and,  therefore,  imputations  required  exceeded  the 
number  of  "YES's"  by  a  factor  of  4  to  1. 

Hourly  Wage  Rates. --The  nonresponse  rates  on 
hourly  wages  are  shown  in  table  4.  These  rates 
are  shown  separately  by  type  of  respondent.  The 
nonresponse  rate  was  9.5  percent  overall,  5.1 
percent  for  self  response  and  16.7  percent  for 
proxy  response.  The  overall  nonresponse  rate  for 
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hourly  wages  increased  from  the  7.8  percent  level 
in  October  to  10.5  percent  in  January.  This  re- 
sulted mainly  from  an  increase  in  the  nonresponse 
rate  for  proxy  responses  of  from  13.8  percent  in 
October  to  19.2  percent  in  January.  Approximate- 
ly 62  percent  of  the  respondents  were  "self." 

Monthly  Wage  or  Salary  Income. --Table  5  contains 
the  nonresponse  rates  for  the  monthly  amounts  of 
wage  and  salary  income.  The  nonresponse  rate 
overall  averaged  about  6.2  percent  for  the  ini- 
tial SIPP  interviews.  The  rate  for  self  respon- 
dents, which  accounted  for  64  percent  of  the 
total,  was  lower,  4.6  percent,  while  the  rate  for 
proxy  respondents  was  9.0  percent.  The  9.0-per- 
cent nonresponse  rate  for  proxy  interviews  on 
monthly  earnings  amounts  was  considerably  lower 
than  the  comparable  rate  of  16.7  percent  for 
hourly  wage  amounts.  Nonresponse  rates  increased 
from  5.4  percent  to  6.7  percent  between  October 
and  January. 

Self-Employment  Income. --Nonresponse  rates  for 
self-employment  income  have  traditionally  ex- 
ceeded those  for  most  income  types.  The  items  in 
the  self -employment  section  of  the  SIPP  question- 
naire cover  monthly  amounts  of  "salary"  and  other 
income  received  by  owners  of  businesses,  profes- 
sional practices,  farms,  etc.  The  question  is 
not  designed  to  obtain  estimates  of  the  busi- 
ness's  net  profit  on  a  monthly  accounting  period. 
An  additional  question  was  included  covering  es- 
timated net  profit  for  the  entire  4-month  refer- 
ence period.  The  nonresponse  rate  overall  for 
the  monthly  salary  or  other  income  received  by 
the  self-employed  was  14.0  percent  (see  table  6). 
The  nonresponse  rate  for  proxy  interviews  ex- 
ceeded that  of  self-responses  by  a  considerable 
margin.  The  rate  for  proxy  interviews  was  22.3 
percent  compared  to  9.8  percent  for  self  re- 
sponses. The  October  nonresponse  rate  of  13.6 
percent  was  not  significantly  different  from  the 
January  rate  of  15.1  percent.  About  two-thirds 
of  respondents  for  this  item  were  "self." 

Interest  Income. --Table  7  contains  nonresponse 
rates  for  interest  amounts  received  during  the 
SIPP  4-month  reference  period.  These  rates  cover 
the  interest  amount  received  from  one  or  more  of 
the  following  sources:  1)  regular  or  passbook 
savings,  2)  money  market  deposit  accounts,  3) 
certificates  of  deposit,  or  other  savings  certi- 
ficates, and  4)  NOW  accounts  or  other  interest 
earning  checking  accounts.  The  nonresponse  rate 
for  interest  income  from  these  sources  was  34.6 
percent.  The  rate  in  January  was  35.4  percent, 
somewhat  higher  than  the  32.6  percent  for  Octo- 
ber. About  4  percent  of  the  total  number  of  non- 
responses  on  interest  amounts  can  be  attributed 
to  refusals.  The  remainder  were  mainly  categor- 
ized as  "Don't  Knows."  A  "Don't  Know"  response 
to  interest  income  was  followed  by  a  question  to 
obtain  the  balance  or  amount  in  the  account. 
The  nonresponse  rates  for  this  item  are  also 
shown  in  table  7.  The  nonresponse  rate  for 
balances  in  savings  was  24.2  percent.  In  combi- 
nation these  two  nonresponse  rates  indicate  that 
both  the  interest  amount  and  the  balance  amount 
were  missing  in  only  about  13.3  percent  of  the 
sample  cases  for  these  sources  of  interest 
income. 


Dividend  Income. --The  questions  covering  the 
amount  of  dividend  income  received  were  divided 
into  two  categories,  those  dividends  actually 
received  and  those  credited  against  a  margin 
account  or  automatically  reinvested  in  additional 
shares  of  stock.  As  indicated  by  the  data  in 
table  8,  the  nonresponse  rates  for  these  two 
categories  differ  significantly.  The  rate  for 
dividends  actually  received  was  9.4  percent.  The 
rate  for  dividends  credited  was  30.7  percent. 


Noninterview  Rates 

The  noninterview  rate  is  a  measure  of  the 
proportion  of  occupied  housing  units,  i.e.,  those 
eligible  for  interview,  for  which  interviews  were 
not  obtained.  As  mentioned  earlier  the  total 
sample  size  for  the  1983  SIPP  was  about  25,900 
housing  units.  Of  this  total  about  4,600  were 
not  eligible  for  interview.  These  ineligible 
units  were  found  to  be  vacant,  demolished,  under 
construction,  or  unoccupied  for  other  reasons. 
This  left  19,900  households  eligible  to  be  con- 
tacted. Interviews  were  not  obtained  for  4.8 
percent  of  this  group  (see  table  9).  Most  nonin- 
terviews,  about  77  percent,  were  refusals  to  par- 
ticipate. The  remainder  of  the  total  noninter- 
view rate  consisted  of  situations  classified  as 
"no  one  home"  and  "temporarily  absent."  These 
classifications  were  assigned  after  repeated 
visits  failed  to  yield  a  contact. 

The  noninterview  rate  varied  considerably  by 
region  of  the  Country.  The  lowest  noninterview 
rate  was  2.4  percent  from  the  Kansas  City  Region- 
al Office  that  covers  Kansas,  Missouri,  Iowa, 
Minnesota,  and  Wisconsin.  '  The  highest  nonin- 
terview rate  was  10.1  percent  from  the  New  York 
Regional  Office  covering  the  parts  of  New  York 
and  New  Jersey  in  the  vicinity  of  New  York  City. 

There  was  slight  variation  in  the  noninter- 
view rates  by  month  of  interview,  however,  these 
rates  were  not  significantly  different  from  one 
another.  The  rate  for  the  first  month  of  inter- 
view was  5.1  percent  compared  to  4.3  percent,  5.2 
percent,  and  4.8  percent  in  the  succeeding  3 
months,  respectively.  The  overall  noninterview 
rate  for  SIPP  (4.8  percent)  was  not  significantly 
different  from  the  overall  rate  for  the  March 
1983  CPS  (4.4  percent)  or  the  rate  for  the  panel 
coming  into  the  March  1983  CPS  for  the  first  time 
(5.4  percent).  As  noted  earlier,  about  7.0  per- 
cent of  the  March  CPS  sample  households  completed 
the  monthly  labor  force  questions  but  were  nonin- 
terviews  on  the  income  supplement.  These  cases 
are  in  addition  to  the  4.4  percent  household 
noninterviews. 

Callback  Items 

The  design  of  the  SIPP  questionnaire  incor- 
porated procedures  for  following  up  on  missing 
responses  to  items  identified  as  either  especial- 
ly important  to  the  overall  quality  of  the  survey 
data  or  with  previously  noted  high  nonresponse 
rates.  The  first  step  in  this  process  was  the 
determination  that  the  answer  to  the  designated 
question  would  be  available  from  another  house- 
hold member  not  present  at  the  time  of  the  inter- 
view or  at  a  later  date.  If  so,  the  interview- 
ers, in  most  cases,  called  back  by  telephone  to 
obtain  the  missing  information.  The  data  in 
table  10  summarize  use  of  the  callback  system. 
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The  callback  system  appears  to  be  most  ef- 
fective for  obtaining  missing  data  on  amounts  of 
monthly  wage  and  salary  income.  About  600  cases 
were  marked  for  callback  for  these  amounts.  The 
procedure  obtained  responses  to  the  missing 
earnings  amounts  in  about  7-out-of-10  cases.  Use 
of  the  callback  was  less  successful  in  obtaining 
missing  amounts  for  the  other  income  sources. 
Slightly  more  than  half  (54  percent)  of  the  call- 
backs were  successful  for  obtaining  data  for  the 
monthly  amount  of  salary  and  other  income  re- 
ceived from  self-employment.  Attempts  to  follow 
up  on  amounts  of  interest  and  dividend  income 
from  various  sources  proved  to  be  even  less  ef- 
fective. About  45  percent  of  the  respondents 
were  able  to  supply  an  amount  when  contacted  by 
an  interviewer.  Use  of  the  callback  procedures 
appears  to  have  declined  between  the  October  and 
January  interviews.  Generally,  the  number  of 
cases  marked  for  follow-up  in  January  were  lower 
than  October.  While  less  frequent  use  of  the 
callback  might  have  been  related  to  a  reduced 
need  for  follow-up,  nonresponse  rates  for  some  of 
these  income  types  tended  to  increase  between 
October  and  January,  indicating  the  opposite. 

Interview  Time 

The  time  required  to  conduct  an  initial  SIPP 
interview  is  potentially  quite  long  given  the 
number  of  questions.  Obviously  households  with  a 
large  number  of  adult  members,  those  15  years  old 
and  over,  are  those  that  are  exposed  to  the 
longest  overall  interview  times,  on  average.  The 
data  in  table  11  provide  the  first  estimates  of 
interview  times  based  directly  on  times  entered 
on  each  person's  questionniare  by  the  interview- 
ers. The  time  required  to  complete  the  household 
control  card  and  roster  was  added  to  the  inter- 
view time  on  the  first  questionnaire  for  the 
household.  These  estimates  are  shown  by  size  of 
household  for  the  first  interview  period  of  SIPP. 

The  median  interview  time  was  43  minutes  for 
all  households  in  the  first  interview.  The 
median  interview  time  declined  steadily  from  48 
minutes  in  October  to  41  minutes  in  January.  The 
median  household  interview  time  for  1-person 
households  was  about  one-half  hour  while  that  for 
4-person  households  was  one  hour  and  ten  minutes. 
Households  with  5,  6,  and  7  or  more  members  re- 
quired proportionally  more  time  for  interviews. 

Summary 

This  examination  of  some  of  the  early  "re- 
turns" from  the  1983  SIPP  are,  for  the  most  part, 
encouraging.  The  household  noninterview  rate  was 
lower  than  most  had  anticipated.  The  item  nonre- 
sponse rates  were  much  lower  than  those  experi- 
enced in  the  March  CPS.  Proxy  responses  caused 
significantly  higher  nonresponse  rates  for  some 
of  the  key  items  studied. 

There  is  reason  for  concern,  however,  in 
several  areas  and  these  should  be  watched  close- 
ly. The  first  is  the  general  trend  toward  higher 
nonresponse  rates  between  October  and  January 
interviews.  The  second  is  the  relatively  high 
noninterview  rate  for  the  New  York  area.  While 
this  is  consistent  with  our  experiences  in  other 
surveys,  this  rate  should  be  monitored  closely  as 
will  the  rates  in  the  other  regions. 

The  next  step  in  the  evaluation  of  the  1983 
SIPP  data  will  be  comparison  of  the  survey  esti- 


mates of  income  recipients  with  figures  derived 
from  program  statistics  and  other  independent 
sources.  This  analysis  will  provide  a  very  im- 
portant look  at  the  magnitude  of  survey  underre- 
porting, a  major  concern  of  SIPP  and  other  house- 
hold income  surveys. 


Figure  1.  Selected  Labor  Force  Questions 

N0NW0RKERS 

2a.  Even  though  ...  did  not  have  a  job  during 
this  period,  did  ...  spend  any  time  looking 
for  work  or  on  layoff  from  a  job? 


|_J  YES  -  ASK  2b 

IZI  N0 


2b.  In  which  weeks  was  looking"  for  work  or 

on  layoff  from  a  job? 


4.  Did  ...  have  a  job  or  business,  either  full 
or  part  time,  during  EACH  of  the  weeks  in 
this  period? 

IZI   YES  -  ASK  5a 

|~|   NO  --  ASK  6a 

5a.  Was  ...  absent  without  pay  from  ...'s  job  or 
business  for  any  FULL  weeks  during  the 
4-month  period? 

|Zl  YES  —  ASK  5b 

IZI  NO 

5b.  In  which  weeks  was  ...  absent  without  pay? 

WORKERS  WITH  WEEKS  WITHOUT  A  JOB  OR  BUSINESS 
6a. 

6b.  Was  absent  from  work  for  any  full  weeks 

without  pay? 

Q   YES  --ASK  6c 

IZI  NO 

6c.  In  which  weeks  was  ...  absent  without  pay? 

7a.  During  the  weeks  that  ...  did  not  have  a  job 
did  ...  spend  any  time  looking  for  work  or 
on  layoff? 

IZI  YES  —  ASK  7b 


7b.  In  which  of  these  weeks  was  ...  looking  for 
work  or  on  layoff  from  a  job? 


In  the  weeks  that  ...  worked  durirvg  the  4- 
month  period,  how  many  hours  did  ...  usually 
work  per  week? 
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Table  1.  Selected  Item  Nonresponse  Rates  for  the 
Labor  Force  Items  on  the  1983  SIPP: 
Interview  No.  1 


Item 

Total 

Rotation 

One 

Two 

Three 

Four 

2a 

0.4 

0.4 

0.4 

0.4 

0.3 

2b 

6.7 

8.2 

6.8 

5.9 

5.9 

4 

0.1 

0.1 

0.1 

(Z) 

0.1 

5a 

0.1 

0.1 

0.1 

0.1 

0.1 

5b 

11.6 

12.6 

11.0 

8.2 

14.4 

6a 

2.2 

2.9 

2.0 

1.9 

1.8 

6b 

3.3 

6.6 

2.3 

1.8 

1.4 

6c 

6.8 

2.1 

12.2 

3.3 

10.5 

7a 

1.0 

0.9 

1.0 

1.1 

0.9 

7b 

3.2 

4.7 

3.7 

2.0 

2.0 

8a 

1.3 

1.3 

1.3 

1.2 

1.2 

Z  Less  than  .05  percent. 


Table  3.  Selected  Income  Nonresponse  Rates  from 
the  March  1983  CPS,  Ratio  of  Nonre- 
sponses  to  "YES"  Responses  for  the 
March  1983  CPS,  and  Ratio  of  Nonre- 
sponses  to  "YES"  Responses  for,.Inter- 
view  No.  1  of  the  1983  SIPP 


March 

1983  CPS 

1983  SIPP 

March 

ratio  of 

ratio  of 

Income  type 

1983  CPS 

nonre- 

nonre- 

nonre- 

sponses 

sponses 

sponse 

to 

to 

rate 

"YES's" 

"YES's" 

Social  Security... 

9.6 

0.61 

.03 

Unemployment 

compensation 

9.6 

1.16 

.03 

Veteran's  payments 

9.6 

1.14 

.10 

Aid  to  Families 

with  Dependent 

9.7 
6.4 

4.28 
0.84 

.01 
.07 

Food  stamps 

Private  pensions.. 

9.6 

1.64 

.01 

Savings  accounts.. 

10.4 

.21 

.02 

Shares  of  stock  or 

mutual  funds 

9.7 

0.69 

.09 

Rental  property... 

9.7 

0.66 

.13 

Table  4.  Nonresponse  Rates  on  Hourly  Wage  Rate 
by  Type  of  Respondent  for  the  1983 
SIPP:  Interview  No.  1 


Table  2.  Selected  Item  Nonresponse  Rates  for 
Income  Recipiency  During  the  4-month 
Reference  Period  on  the  1983  SIPP: 
Interview  No.  1 


Income  type 

Total 

Rotation 

One 

Two 

Three 

Four 

Social  Security 

0.6 

0.6 

0.6 

0.5 

0.5 

Unemployment 

compensation 

0.1 

0.1 

0.1 

0.1 

0.1 

Veteran's  payments.. 

0.2 

0.2 

0.2 

0.2 

0.2 

Aid  to  Families  with 

Dependent  Children. 

(Z) 

(Z) 

(Z) 

(Z) 

(Z) 

Food  stamps 

0.3 

0.4 

0.4 

0.2 

0.2 

Private  pensions.... 

(Z) 

(Z) 

(Z) 

0.1 

(Z) 

Savings  accounts.... 

1.0 

0.8 

0.8 

1.1 

1.1 

Shares  of  stock  or 

mutual  funds 

1.3 

1.4 

1.1 

1.3 

1.3 

Rental  property 

1.0 

1.0 

0.7 

0.9 

1.1 

Type  of 
respondent 

Total 

Rotation 

1 

One  Two 

Three 

Four 

Total 

9.5 

7.8 |  9.3 

10.4 

10.5 

Self 

5.1 

4.1 

4.7 

5.9 

5.6 

Proxy 

16.7 

13.8 

16.1 

18.0 

19.2 

Proportion  of 

Self  Responses... 

.62 

.62 

.60 

.63 

.64 

Nonresponse  Rates  on  Monthly  Wage  and 
Salary  Income  by  Type  of  Respondent  for 
the  1983  SIPP:  Interview  No.  1 


Z   Less  than  .05  percent. 


1 

Type  of 
respondent 

Total 

Rotation 

One 

Two 

Three 

Four 

Total 

6.2 

5.4 

5.8 

6.8 

6.7 

Self 

4.6 

4.2 

4.3 

4.9 

4.9 

Proxy 

9.0 

7.6 

8.4 

10.2 

10.1 

Proportion  of 

Self  Responses.. 

.64 

.63 

.63 

.64 

.6b 
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Table  6.  Nonresponse  Rates  on  Monthly  Amounts  of 
Self-Employment  Income  for  the  1983 
S1PP:  Interview  No.  1 


Type  of 
respondent 

Total 

Rotation 

One 

Two 

Three 

Four 

Total 

14.0 

13.6 

12.6 

14.6 

15.1 

Self 

9.8 

9.5 

9.7 

9.6 

10.2 

Proxy 

22.3 

21.4 

18.6 

24.3 

24.7 

Proportion  of 

Self  Responses.... 

.66 

.bb 

.6/ 

.66 

.66 

Household  Noninterview  Rates  by 
Regional  Office  for  the  1983  SIPP: 
Interview  No.l 


Item 

Total 

Rotation 

One 

Two 

Three 

Four 

Total 

4.8 
3.8 
10.1 
3.0 
4.1 
4.8 
2.4 
4.7 
3.5 
4.9 
5.1 
5.3 
7.5 

5.1 
2.9 
13.3 
2.0 
3.0 
5.0 
1.6 
5.1 
4.3 
5.4 
5.0 
6.1 
9.3 

4.3 
2.5 
8.3 
3.4 
3.6 
3.4 
1.6 
4.4 
2.7 
5.0 
5.1 
5.7 
6.2 

5.2 
5.4 

10.8 
2.5 
5.4 
5.7 
4.0 
5.2 
2.8 
5.2 
4.6 
4.1 

..8.9 

4.8 
4.6 
8.4 
4.1 
4.1 
5.0 
2.5 
4.3 
3.8 
4.2 
5.8 
5.5 
5.8 

Boston 

New  York 

Philadelphia.. 

Detroit 

Chicago 

Kansas  City... 

Seattle 

Charlotte 

Atlanta 

Dallas 

Denver 

Los  Angeles.. . 

Table  10.  Success  Rates  of  Callback  Items 


Table  7.  Nonresponse  Rates  for  Amounts  of 

Interest  Income  from  the  1983  SIPP: 
Interview  No.  1 


Item 

Total 

Rotation 

One 

iwo 

Three 

Four 

Interest  amount.... 
Percent  refusals. 

Balance  amount 

34.6 
4.2 

24.2 

32.6 

4.1 

23.6 

33,8 
4.0 

24,1 

37.1 
4.6 

24.9 

35.4 
4.1 

24.1 

Item 

Total 

Rotation 

One 

Two 

Three 

Four 

Success  Rates 

Wages  and  salary.... 

Self- employment. , 

Interest  and 
dividends 

71.0 
54.0 

44.8 

76.2 
58.6 

48.4 

76.9 
55.0 

49.6 

70.0 
48.3 

38.2 

59.0 
54.5 

40.8 

Number  of  Callbacks 

Wages  and  salary.... 

Self-employment 

Interest  and 

599 
100 

582 

172 
29 

192 

143 
20 

139 

150 
29 

131 

134 
22 

120 

Median  Household  Interview  Times  by 
Number  of  Members  15  Years  Old  and 
Over  from  the  1983  SIPP:  Interview 
No.  1 


Nonresponse  Rates  for  Amounts  of 
Dividend  Income  for  the  1983  SIPP: 
Interview  No.  1 


Item 

Total 

Rotation 

One 

Two 

Three 

Four 

Dividends->received. 
Dividends  credited. 

9.4 
30.7 

10.3 
28.2 

8.3 
33.8 

9.8 
30.1 

9.3 

30.5 

Number  of 
persons 

Total 

Rotation 

One 

Two 

Three 

Four 

Total 

One 

43 
29 
44 
57 
70 
83 
98 
113 

48 
33 
50 
64 
76 
90 
105 
114 

44 
30 
45 
57 
72 
81 
111 
(B) 

42 
26 
42 
55 
67 
84 
101 
120 

41 
26 
41 
55 
66 
77 
71 
94 

Four 

Five 

Six 

Seven  or  more. . 

B  Less  than  10  sample  households. 
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DISCUSSION 
Roy  Whitmore,  Research  Triangle  Institute 

1.  Introduction 

The  Bureau  of  the  Census  is  to  be  commended 
for  presenting  papers  dealing  with  proposed 
Methodology  during  the  planning  stages  of  the 
Survey  of  Income  and  Program  Participation 
(SIPP).  Presentation  of  these  papers  is  sure  to 
stimulate  constructive  suggestions  from  the 
scientific  community.  On  the  other  hand,  the 
SIPP  has  been  in  progress  since  October  1983  and 
a  second  panel  is  to  be  fielded  in  January  1985. 
Hence,  it  is  important  for  methodological  issues 
that  impact  directly  upon  data  collection  tech- 
niques be  resolved  as  quickly  as  possible. 
The  first  of  the  three  papers  being  reviewed 
discusses  person-level  and  household-level  cross- 
sectional  weighting  procedures  for  the  1979 
Research  Panel  of  the  Income  Survey  Development 
Program  (ISDP),  a  nationwide  field  test  for  the 
SIPP.  The  next  two  papers  discuss  person-level 
and  household-  or  family-level  longitudinal 
weighting  methods  being  considered  for  the  SIPP. 
Each  paper  will  be  discussed  individually, 
although  it  will  be  noted  that  some  comments 
pertain  to  all  three  papers. 

2.  Cross-Sectional  Estimates  for  the  ISDP  by 


Cross-sectional  household  weights  are  pre- 
sented for  both  the  area  frame  and  the  list 
frame  samples  of  the  1979  Research  Panel.  The 
proposed  weighting  procedures  are  discussed 
below  or  each  sampling  frame. 
2.2  Area  Frame  Sample  Weights 

The  population  of  inferential  interest  for 
the  1979  Research  Panel  was  defined  to  be  the 
1979  civilian,  noninstitutionalized  United 
States  adult  population.  Standard  area  frame 
household  sampling  procedures  were  used  to 
select  a  sample  of  members  of  this  population  in 
the  Wave  1  sample,  which  was  fielded  early  in 
1979.  However,  only  adults  (aged  >  16)  in  the 
Wave  1  sample  were  followed  when  they  moved  to 
new  addresses  during  1979.  Thus,  "additional" 
people  who  entered  the  target  population  during 
1979  (by  birth,  by  entering  the  United  States, 
or  by  leaving  the  military  or  an  institution) 
where  only  interviewed  while  living  in  a  house- 
hold that  contained  at  least  one  Wave  1  sample 
■ember.  As  a  result,  the  sample  fails  to  ade- 
quately reflect  "additional"  people  in  the 
target  population.  This  issue  will  arise  again 
and  be  discussed  more  fully  with  regard  to  the 
two  SIPP  methodology  papers. 

Two  unbiased  cross-sectional  time  t  estimators 
were  discussed  for  estimation  of  the  population 
total,  X  .  I  found  it  instructive  to  reformu- 
late these  estimators  as  follows: 


The  following  motivational  statement  is 
found  early  in  Huang's  paper: 

"There  is  a  great  deal  of  interest  in 
developing  cross-sectional  weights  at 
the  time  of  each  interview  wave." 
Due  to  the  use  of  three  rotation  groups  within 
each  wave,  I  question  the  use  of  wave-specific 
cross-sectional  weights  for  direct  data  analy- 
sis. The  inferential  population  would  be  diffi- 
cult to  define  due  to  the  sequence  of  referece 
periods  applicable  to  the  rotation  groups  (see 
Figure  1).  Wave-specific  cross-sectional 
weights  are  important  for  defining  longitudinal 
weights,  as  is  apparent  from  the  other  two 
papers  being  reviewed.  Huang  presents  his 
weight  formulas  in  the  context  of  weights  for 
cross-sectional  estimates  that  are  time-specific 
rather  than  wave-specific,  which  is  probably 
more  useful  for  data  analysis.  The  formulas 
presented  in  the  paper  can  actually  be  con- 
sidered to  be  either  wave-specific  or  time- 
specific  weights. 

The  proposed  weighting  procedures  are  dis- 
cussed in  terms  of  estimation  of  the  population 
total 


l-  xt,M=.!;wt,M(i)xt(i>'  (2) 

where 

I   (r.  n.)"1   for  the  i-th 
[jeS     l     J     household  in  the 
time  t  sample, 

for  households 
not  in  the  time  t 
sample, 

S  =  {Wave  1  (time  t  )sample  households}, 

r.  =  Number  of  households  in  the  Wave  1 
universe  contributing  members  to  the 
i-th  time  t  household,  and 

n.  =  Selection  probability  for  the  j-th  Wave 
J    1  sample  household. 


X.  =  V   X.(i), 
*   i=l 


(1) 


where  i=l,...,N  indexes  the  "units  (persons  or 
households)"  in  the  target  population  at  time  t. 
The  weighting  formulas  presented  are  for  cross- 
sectional  household  weights  applicable  as  of 
either  time  t  or  wave  w.  Since  all  "adult" 
■embers  of  sample  households  are  interviewed, 
the  cross-sectional  household  weights  can  be 
assigned  to  all  household  ■embers  for  cross- 
sectional  person-level  analyses. 


X  S,  .(S.  n.)  for  the  i-th 
jeS  1J  10  J  household  in 
0  the  time  t 

sample 

for  households 
not  in  the  time 
t  sample, 


123 


S  ■  Number  of  members  of  the  j-tb  Wave  1 
household  who  belong  to  the  i-th  tine 
t  household,  and 

Sio  =  Number  of  members  of  the  Wave  1  uni- 
verse who  belong  to  the  i-th  time 
t  household. 

In  Huang's  paper  the  estimator  (2)  is  referred 
to  as  the  multiplicity  estimator  and  (3)  is 
referred  to  as  the  fair  share  estimator.  In 
fact,  both  (2)  and  (3)  are  multiplicity  esti- 
mators. The  difference  is  that  the  weight  W  M 
is  based  upon  household-level  multiplicity  and 
VL  _  is  based  upon  person-level  multiplicity. 
The  paper  shows  that  both  estimators  provide 
unbiased  estimates  of  the  population  total,  X  , 
invoking  the  "fair  share  assumption"  for  the 
estimator  (3).  In  fact,  the  weights  W  „  and 
Wt  F  are  identical  to  the  initial  family  weights 
fof  the  national  household  survey  component  of 
the  National  Medical  Care  Utilization  and  Expen- 
diture Survey  (NMCUES)  [See  Whitmore,  et  al 
(1982a)].  In  the  NMCUES  report,  it  is  shown 
that  both  of  these  estimators  provide  unbiased 
estimates,  even  without  the  "fair  share  assump- 
tion" for  the  estimator  (3).  Huang's  conclusion 
that  the  estimator  (3)  is  preferable  mainly 
because  it  produces  less  variable  weights  and 
hence  smaller  sampling  variances  is  also  sup- 
ported. 
2.3  List  Frame  Sample  Weights 

Huang's  paper  defines  the  population  of 
inferential  interest  for  the  sample  based  upon 
SSI  and  BEOG  lists  as  follows: 

"At  any  time  t,  the  target  population 
consists  of  the  original  list  frame 
subpopulation  (Groups  I  and  II)  and 
the  type  of  'additions'  defined  for 
the  area  frame." 
Hence,  the  time  t  target  population  is  the  Wave 
1  universe  plus  "additions."   Additions  for  the 
area  frame  sample  were  civilian,  noninstitu- 
tionalized  United  States  adults  who  joined  this 
group  by  birth,  by  entering  the  United  States, 
or  by  leaving  the  military  or  an  institution.   I 
expect  that  the  author  does  not  intend  to  in- 
clude all  such  additions  in  the  target  popula- 
tion since  the  Wave  1  universe  does  not  include 
all  civilian,  noninstitutionalized  United  States 
adults.   Maybe  only  those  additions  that  simul- 
taneously enter  the  universe  and  enter  a  house- 
hold containing  a  member  of  the  Wave  1  universe 
are  intended  to  belong  to  the  target  population. 
In  any  case,  the  field  procedures  did  not  pro- 
vide adequate  coverage  of  additional  target 
population    members    because    only    adults 
(aged  >  16)  in  the  Wave  1  sample  were  followed 
when  they  moved  to  new  addresses  during  1979,  as 
was  true  for  the  area  frame  sample. 

Two  cross-sectional  time  t  estimators  of  the 
population  total,  X  ,  were  presented.  I  found 
it  instructive  to  reformulate  these  estimators 
as  follows: 


*  T  I  (B.  +  Uk)  n  ]         for  the  k-th 
i  ieS^        *   *     household  in 
the  time  t 
sample  con- 
taining a  member 
from  S  , 

for  households 
in  the  time  t 
sample  containini 
no  members  from 

V 

SQ  =  {Wave  1  list  frame  (Group  I)  sample 
persons}, 

p*k  =  Number  of  Wave  1  list  frame  (Group  I) 
persons  in  the  k-th  time  t  household, 

1  if  the  k-th  time  t  household  contains 
any  additions  (Group  III  persons), 

Uk  = 

0  otherwise,  and 

It.  =  Selection  probability  for  the  i-th 
Wave  1  list  frame  (Group  I)  sample 
person. 


2.  X**   =  I*  wj  _(k)  X.(k), 
t,F        t,F     t 


j£*  <W  v 


<F<k>  ' 


(5) 


for  households 
not  in  the  time 
t  sample, 


S  =  {Wave  1  (time  t  )  sample  households}, 

S.  .  =  Number  of  members  (Groups  I  and  II)  of 
J   the  j-th  Wave  1  sample  household  who 
belong  to  the  k-th  time  t  household, 


=  (Total  number  of  members  of  Wave  1 
sample  households  who  belong  to  the 
k-th  time  t  household)  +  (Number  of 
additions  (Group  III  persons)  in  the 
k-th  time  t  household),  and 

=  Unbiased  multiplicity-adjusted  weight 
for  the  j-th  Wave  1  household. 


oj 


<«■ 


,  (k)  X.  (k), 


The  paper  notes  that  the  estimators  (4)  and 
(5)  do  not  provide  unbiased  estimates  of  the 
population  total,  X  .  Part  of  the  problem  may 
be  that  additional  (Group  III)  sample  members 
explicitly  enter  the  weight  computations.  Since 
households  in  the  sample  must,  by  definition, 
contain  at  least  one  Group  I  or  II  sample  mem- 
ber, Group  III  persons  need  not  explicitly  enter 
the  weight  computations. 
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It  should  be  noted  that  the  alternative 
weights  W~  M  and  W~  do  not  give  positive 
weights  to'  identically  the  same  households. 
Tine  t  households  that  contain  Group  II  people, 
but  no  Group  I  people,  are  given  a  weight  of 
zero  by  W~  M,  whereas  Vr  is  positive  for  these 
households!"  t,i! 

Consideration  should  be  given  to  defining  the 
person-level  target  population  as  simply  the 
original  list  frame  (Group  I)  persons.  Weights 
similar  in  definition  to  (2)  and  (3)  can  then  be 
defined  that  provide  unbiased  estimates  of 
population  totals  for  this  target  population. 
These  weights  would  be  essentially  the  same  as 
the  initial  family  weights  used  for  defining 
longitudinal  family  weights  for  the  state  Medi- 
caid household  survey  component  of  the  NMCUES 
[See  Whitmore  et  al  (1982b)]. 
3.  Person-Level  Longitudinal  Weights  for  the 
SIPP  by  Judkins,  et  al 

Early  in  the  Judkins  paper,  it  is  stated 
that  when  an  interview  is  missing  for  a  wave  and 
is  bracketed  by  good  interviews,  imputation  will 
probably  be  used  for  the  missing  wave.  Why  not 
use  a  longer  reference  period  for  the  next  wave 
interview  and  collect  the  data  directly,  as  was 
done  in  the  NMCUES? 

The  SIPP  universe  at  any  fixed  point  in  time 
is  defined  as  the  persons  aged  15  or  older  who 
are  members  of  the  civilian,  noninstitutional 
United  States  population,  as  well  as  members  of 
the  military  living  on  bases  with  family  or 
living  off  bases.  Dynamic  longitudinal  features 
of  this  universe  are: 

1.  "Additions"  -  Individuals  who  were  not 
members  of  the  Wave  1  Universe  but  became 
members  of  the  SIPP  universe  during  the 
panel's  2  2/3  year  reference  period. 

2.  "Exits"  -  Individuals  who  left  the  SIPP 
universe  during  the  2  2/3  year  reference 
period  due  to  death,  moving  out  of  the 
United  States,  or  going  into  the  military 
or  an  institution. 

The  Wave  1  interview  should  probe  for  the  occur- 
rence of  such  events  during  the  Wave  1  reference 
period.  As  was  true  for  the  ISDP,  only  Wave  1 
sample  members  are  followed  to  new  addresses 
when  they  move,  and  current  SIPP  survey  proce- 
dures do  not  provide  adequate  coverage  of  the 
"additional"  target  population  members.  Methods 
for  improving  coverage  of  the  "additional" 
target  population  members  will  be  discussed 
later  in  this  section. 

The  Judkins  paper  indicates  that  the  ideal 
annual  longitudinal  universe  is  the  union  of  12 
monthly  universes.  Either  this  universe  or  the 
union  of  366  daily  universes  should  be  the 
target  population.  The  problem  of  analysis  of 
annual  statistics  when  some  population  members 
are  survey-eligible  for  less  than  the  full  year 
is  noted  as  one  difficulty  with  this  target 
population  definition.  I  believe  that  methods 
exist  or  can  be  developed  to  adequately  address 
this  problem.  For  example,  estimation  of  an 
annual  mean  can  be  based  upon  the  following 
statistics: 

T  (i)  =  Annual  income  of  the  i-th  sample 
*      member  while  survey-eligible, 


P  (i)  s  Proportion  of  the  days  in  the  year 
that  the  i-th  sample  member  was 
survey-eligible,  and 
W(i)  =  Longitudinal  analysis  weight  for 
the  i-th  sample  member. 
The  population  totals  for  Y   and  P   would  be 
estimated  unbiasedly  as  follows:     a 

(6) 

(7) 


These  estimators  would  have  the  following  inter- 
pretation: 

N(a)  =  Unbiased  estimate  of  total  annual 
personal  income  for  the  target 
population,  and 

0(a)  =  Unbiased  estimate  of  the  average  daily 
number  of  members  in  the  target  popu- 
lation. 

Hence,  the  ratio  estimator, 

R(a)  =  N(c)  /  D(a),  (8) 

would  provide  a  consistent  estimate  of  the 
average  annual  personal  income. 

Estimation  of  the  population  distribution  of 
annual  statistics,  such  as  total  annual  personal 
income,  is  somewhat  more  difficult.  The  income 
of  a  sample  member  who  was  survey-eligible  only 
part  of  the  year  requires  special  treatment. 
The  NMCUES  defined  a  time-adjusted  income  de- 
fined for  each  sample  member  as 

TAY(i)  =  Ya(i)  /  Pa(i),  (9) 

and  produced  the  distribution  of  these  time- 
adjusted  values.  Another  possibility  is  to 
produce  separate  distributions  of  annual  income 
for  individuals  who  were  survey-eligible  for  12 
months,  11  months,  10  months,  etc.  A  third 
possibility  might  be  to  simply  estimate  the 
annual  average  monthly  income  based  upon  all 
sample  members  who  were  survey-eligibte  for  one 
month  or  more,  instead  of  the  average  annual 
income . 

Four  longitudinal  weighting  procedures  are 
discussed  in  Judkin's  paper.  The  first  procedre 
defines  a  longitudinal  weight  applicable  for  all 
longitudinal  analyses  of  an  individual's  data, 
irrespective  of  the  analysis  time  period.  A 
weight  of  this  type  is  definitely  needed  for 
each  sample  member  to  facilitate  all  types  of 
longitudinal  analyses.  This  first  procedure 
gives  zero-valued  weights  to  all  "associated" 
sample  members.  These  data  are  collected  mainly 
to  enable  family  and  household  analyses.  The 
other  procedures  attempt  to  make  greater  use  of 
the  data  for  "associated"  sample  members  by 
giving  some  of  them  positive  weights  for  par- 
ticular analysis  time  periods.  Since  these 
"associated"  sample  members  bad  a  chance  of 
inclusion  in  the  Wave  1  sample  and  were  not 
selected,  the  bias  and  variance  reduction  pro- 
perties of  these  procedures  would  have  to  be 
investigated  carefully  before  these  procedures 
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could  be  recommended.  Empirical  studies  based 
upon  the  longitudinal  data  collected  by  the 
ISDP,  NMCUES,  and/or  National  Medical  Care 
Expenditure  Survey  (NMCES)  could  provide  the 
basis  for  resolving  this  issue. 

A  weighting  procedure  similar  to  the  first 
procedure  in  Judkin's  paper  can  provide  improved 
coverage  of  the  target  population  with  some 
modification  of  SIPP  field  procedures.  The 
changes  in  field  procedure  would  be  the  fol- 
lowing : 

1.  Each  "additional"  sample  member  becomes  a 
"key  addition"  (i.e.,  to  be  followed  to  the 
end  of  the  2  2/3  year  panel  and  receive 
positive  longtudinal  weights)  if  the  first 
household  that  the  person  belongs  to  after 
entering  the  universe  is  a  sample  house- 
hold. 

2.  The  Wave  1  sample  housing  units  (and  the 
half-open  intervals  between  sample  housing 
units  and  next  listed  housing  units)  are  to 
be  '  monitored  throughout  the  2  2/3  year 
panel  for  entry  of  "additional"  universe 
members.  If  such  "additional"  people  move 
into  one  of  these  housing  units  and  estab- 
lish their  own  independent  household  as 
their  first  household  after  re-entry  into 
the  universe,  they  are  also  "key  addi- 
tions." 

Using  this  data  collection  protocol,  all  longi- 
tudinal weights  Can  be  based  upon  selection 
probabilities  for  Wave  1  sample  households  as 
follows: 

1.  For  each  member  of  a  Wave  1  sample  house- 
hold, the  longitudinal  weight  is  the  recip- 
rocal of  the  selection  probability  for  that 
household. 

2.  Every  "key  additional"  sample  member  can  be 
linked  uniquely  to  either  a  Wave  1  sample 
household  or  a  Time  t  (time  of  entry  into 
the  universe)  sample  household.  Hence,  the 
longitudinal  weight  for  such  a  person  is 
either  the  reciprocal  of  the  selection 
probability  for  the  uniquely  linked  Wave  1 
household  or  the  Time  t  cross-sectional 
weight  of  the  uniquely  linked  sample  house- 
hold. 

3.  All  "associated"  sample  members  and  other 
"additional"  sample  members  get  a  weight  of 
zero  because  they  could  have  been  selected 
into  the  sample,  but  were  not. 

The  person-level  longitudinal  weight  adjust- 
ment procedures  discussed  in  Judkin's  paper  seem 
reasonable.  I  would,  however,  only  recommend 
the  cross-sectional  consistency  adjustments  to 
monthly  totals  and  equalization  within  marriage 
groups  if  the  adjustments  to  the  post-stratified 
weights  were  minor. 

A.  Household-  and  Family-Level  Longitudinal 
Weights  for  the  SIPP  by  Ernst,  et  al 
Ernst  presents  four  longitudinal  household 
definitions  for  consideration.  Preference  is 
indicated  for  a  "Shared  Experiences"  definition. 
What  is  the  justification  for  choosing  this 
definition?  More  consideration  should  be  given 
to  the  question:  "What  longitudinal  household 
or  family  definitions  are  most  useful  for  ad- 
dressing analysis  issues?" 


Ernst  suggests  that  longitudinal  families  not 
be  identified  as  such  but  rather  that  longitudi- 
nal households  be  classified  as  family  and 
non-family  households.  The  desirability  of  this 
approach  is  questionable.  Families  that  exist 
either  long-term  or  short-term  as  multi-family 
households  are  potentially  important  for  family- 
level  analyses.  Based  upon  the  NMCES  and  NMCUES 
experience,  it  is  not  especially  difficult  to 
divide  households  into  family  reporting  units 
for  data  collection. 

Consideration  should  be  given  to  identifying 
the  properties  that  one  would  like  all  longi- 
tudinal households  or  families  to  satisfy.  Such 
properties  might  include  the  following: 

1.  Since  cross-sectional  families  are  well- 
defined  at  any  fixed  point  in  time,  it  may 
be  desirable  for  the  longitudinal  families 
in  existence  at  any  fixed  point  in  time  to 
be  identical  to  the  cross-sectional  fami- 
lies in  existence  at  that  same  point  in 
time. 

2.  It  may  be  desirable  for  changes  in  house- 
hold composition  that  strongly  affect 
family  income  or  program  participation  to 
trigger  the  beginning  and  ending  of  SIPP 
longitudinal  families. 

Some  questions  like  "What  longitudinal  family 
definition  is  most  useful  for  assessing  the 
effect  of  divorce  on  family  income?"  should  be 
addressed  in  detail  before  adopting  a  SIPP 
longitudinal  family  definition.  In  fact,  con- 
sideration of  how  to  best  address  analysis 
issues  may  suggest  that  multiple  longitudinal 
family  definitions  are  needed  to  satisfy  multi- 
ple analysis  objectives. 

The  Ernst  paper  presents  five  longitudinal 
household  weighting  procedures.  Each  weighting 
procedure  is  based  on  cross-sectional  household 
weights  that  are  equivalent  to  Huang's  "fair 
share"  weight.  This  appears  to  be  the  proper 
basis  for  longitudinal  household  weights. 

The  need  for  data  for  time  periods  when  the 
longitudinal  family  is  not  in  the  sample  is 
investigated.  The  need  for  this  additional 
retrospective  or  prospective  data  depends  upon 
both  the  family  definition  and  the  weighting 
procedure. 

Use  of  longitudinal  weights  applicable  only 
to  specific  time  periods  is  discussed  as  a  means 
for  making  use  of  more  of  the  data  collected  for 
specific  time  periods.  As  noted  in  the  paper, 
these  procedures  also  tend  to  require  the  great- 
est amount  of  data  for  time  periods  when  the 
family  is  not  in  the  sample.  The  variance/bias 
tradeoff  would  have  to  be  carefully  investigated 
for  these  procedures  before  they  could  be  recom- 
mended. Empirical  investigations  based  upon  the 
ISDP,  NMCUES,  and/or  NMCES  databases  may  be 
useful  in  this  regard.  In  any  case,  it  is 
important  to  have  a  longitudinal  weight  appli- 
cable for  all  time  periods  to  enable  longi- 
tudinal family  analyses  of  all  kinds. 

One  shortcoming  of  all  family  weighting 
procedures  suggested  by  Ernst  is  that  the  fami- 
lies spawned  by  "additional"  sample  members  all 
get  zero  weights.  The  paper  states  that  the 
first  procedure  discussed  is  the  procedure  used 
by  the  NMCUES.   This  is  not  exactly  true  because 
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the  NMCUES  traced  certain  types  of  "key  addi- 
tional" sample  members  and  assigned  positive 
weights  to  the  families  spawned  by  them.  The 
procedures  discussed  with  regard  to  the  Judkins 
paper  are  recommended  for  identifying  and  trac- 
ing "key  additional"  people.  Given  these  survey 
procedures,  an  unbiased  "beginning  date"  type  of 
longitudinal  family  weightng  procedure  is  pre- 
sented in  Horvitz  and  Folsom  (1980).  Review  of 
this  paper  is  highly  recommened  to  everyone 
interested  in  longitudinal  surveys. 

The  weight  adjustment  procedures  discussed  in 
the  Ernst  paper  appear  to  be  appropriate  and 
satisfactory  for  the  most  part.  However,  weight 
adjustment  is  discussed  as  a  method  for  compen- 
sating for  lack  of  data  for  specific  time  inter- 
vals, e.g.,  prior  to  the  first  interview  or 
following  the  last  interview.  In  order  to 
adjust  for  this  type  of  nonresponse,  the  NMCUES 
used  attrition  imputation  procedures.  I  feel 
that  attrition  imputation  is  a  more  satisfactory 
solution  because  it  can  address  all  data  missing 
due  to  attrition  at  once  and  the  resulting 
database  is  more  amenable  to  analysis.  Finally, 
I  would  only  recommend  the  final  adjustment  of 
longitudinal  family  weights  to  monthly  controls 
if  the  adjustments  were  a  minor. 
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SURVEY  OF  INCOME 
AND 
PROGRAM  PARTICIPATION: 
SESSION  IV 


This  section  is  comprised  of  five  papers  presented 
in  this  session  which  was  sponsored  by  the  Section 
on  Survey  Research  Methods. 


MONTH-TO-MONTH  RECIPIENCY  TURNOVER  IN  THE  ISDP 
Jeffrey  C,  Moore  and  Daniel  Kasprzyk,  U.S.  Bureau  of  the  Census 


The  major  Impetus  to  the  development  of  the 
Survey  of  Income  and  Program  Participation 
(SIPP)  was  the  need  for  more  detailed  and 
better  quality  Income  data  than  were  available 
through  current  survey  programs— most  notably, 
the  March  income  supplement  to  the  Current 
Population  Survey  (CPS)  (David,  1983;  Yeas  and 
Linlnger,  1981).  The  SIPP  Itself  has  only  been 
in  the  field  since  October  of  1983,  so  there 
are  not  yet  sufficient  data  for  a  thorough 
assessment  of  its  performance.  However,  the 
precursor  to  the  SIPP,  the  Income  Survey 
Development  Program  (ISDP),  is  an  available 
and  underutilized  data  source  offering  a 
wealth  of  information  to  researchers  with 
interests  1n  a  wide  range  of  SIPP-related 
issues. 

Background 

This  paper  uses  the  1979  Panel  of  the  ISDP 
to  examine  a  particular  data  quality  problem 
concerning  month-to-month  turnover  in  the 
receipt  of  various  income  types.  The  basic 
question,  first  raised  by  Czajka  (1982),  1s  as 
follows:  given  six  monthly  observations  over 
two  consecutive  survey  waves  (each  of  which 
covers  retrospectively  a  3-month  period),  what 
is  the  pattern  of  recipiency  turnover  in  the 
resulting  five  pairs  of  months?  Czajka's 
interpretation  of  tables  prepared  for  another 
purpose  by  Lepkowski  and  Kalton  (1981)  was 
that  in  survey  waves  1  and  2  of  the  1979  panel 
there  was  "a  pronounced  tendency  for  reported 
program  turnover  to  occur  between  waves  more 
often  than  within  waves— i.e.,  between  months 
three  and  four  rather  than  the  four  other  pairs 
of  months"  (p.  93).  Moore  (1983),  however,  1n 
a  quantitative  analysis  of  the  Lepkowski  and 
Kalton  tables,  failed  to  find  the  effect 
suggested  by  Czajka.1/ 

This  discrepancy  between  the  two  investiga- 
tions 1s  attributable  to  differing  Interpreta- 
tions of  one  of  the  response  indicators  in  the 
tables— specifically,  whether  a  particular  code 
indicated  "no  data"  (i.e.,  a  case  which  could 
not  be  matched  across  the  two  waves)  or  "no 
receipt."  Notwithstanding  this  confusion,  two 
additional  factors  argued  strongly  for  a  more 
careful  examination  of  the  Issue.  First  was 
the  issue  of  completeness.  For  their  work, 
Lepkowski  and  Kalton  linked  only  the  first  two 
waves  of  the  1979  panel,  leaving  untouched 
waves  3,  4,  and  5.  A  second  shortcoming  had 
to  do  with  the  quality  of  the  Unking  operation 
itself.  Lepkowski  and  Kalton  had  at  their  dis- 
posal only  an  early  version  of  the  ISDP  data 
file,  which  contained  numerous  errors  in  the 
person  identifier  code  crucial  to  the  linking 
of  survey  records  across  waves ,1/ 

Subsequent  work  carried  out  by  Mathematlca 
Policy  Research,  Inc.,  apparently  corrected  the 
problems  with  the  person  identifiers,  resulting 
in  the  creation  of  a  linked  data  file  which  had 
substantially  more  matches  than  the  earlier 


file  produced  by  the  Michigan  group.  In  addi- 
tion, all  five  relevant  waves  of  the  1979  Panel 
were  Included  in  the  linking  operation.  The 
remainder  of  this  paper  analyzes  and  discusses 
tabulations  derived  from  the  later  "definitive" 
edition  of  the  1979  ISDP  data  file  to  address 
more  conclusively  the  Issue  of  within-wave  ver- 
sus between-wave  month-to-month  income  recip- 
iency turnover. 

Method  and  Results 

The  income  types  selected  for  analysis  here 
were  Identical  to  the  set  used  in  the  original 
Lepkowski  and  Kalton  paper:  the  two  major 
earned  income  categories  (wage  or  salary  Income; 
self -employment  or  farm  income),  and  15  addi- 
tional sources  including  all  of  the  major 
government  transfer  programs  (e.g.,  Social 
Security;  Supplemental  Security  Income;  unem- 
ployment compensation;  veterans  benefits;  Aid 
to  Families  with  Depeodent  Children  (AFDC); 
food  stamps;  etc.).  For  these  major  programs, 
each  respondent  in  two  consecutive  waves  of  the 
ISDP  has  six  monthly  observations;  we  use  the 
term  "month-pair"  to  refer  to  each  pair  of 
successive  months.  Thus,  each  set  of  linked 
waves  Includes  five  month-pairs,  which  can  be 
designated  as  l->2  and  2->3  (within  survey 
wave  n),  3->4  (the  last  month  of  wave  n  and  the 
first  first  month  of  wave  n+1),  and  4->5  and 
5->6  (within  wave  n+1).  For  each  income  type 
in  each  month-pair,  a  turnover  rate  (P1(1+1)) 
was  calculated  as  the  number  of  adult  sample 
persons^/  who  changed  recipiency  status  with 
regard  to  income  source  X  (I.e.,  who  received 
income  of  type  X  in  the  first  month  of  the  pair 
but  not  in  the  second,  or  vice  versa)  divided  by 
the  total  number  of  adult  sample  persons.  The 
between-wave  rate,  P34,  was  then  compared  to 
the  average  of  the  within-wave  rates,  p  3  1/4 
(P12  +  P23  +  P45  +  P56K  Tne  difference 
between  these  two  values,  Pdiff  =  P34  -  P. 
comprises  the  major  variable  of  interest  for 
this  paper. 

Table  1  summarizes  the  results  of  a  simple 
test  of  significance!/  carried  out  on  each 
Pdiff  for  tne  17  income  types  across  all  sets 
of  linked  survey  waves!/.  The  message  of 
Table  1  1s  unmistakeable.  There  is  a  strong 
and  consistent  tendency  toward  greater  turnover 
in  recipiency  between  survey  waves  than  between 
months  within  a  wave.  Of  the  85  p^ff  observa- 
tions in  Table  1,  78  are  positive  (i.e.,  P34 
>  "p).  Sixty-nine  of  the  differences  are  signi- 
ficantly positive,  51  are  significant  at  the 
p<.01  level  or  beyond.  In  contrast,  only  one 
difference  is  significant  in  the  opposite 
direction. 

Almost  as  obvious  as  the  general  trend  in 
Table  1  are  its  two  apparent  exceptions.  Six 
of  the  seven  negative  difference  scores  (includ- 
ing the  only  significantly  negative  value) 
are  concentrated  1n  two  closely  related 
income  sources— educational  benefits  and  Basic 
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Educational  Opportunity  Grants  (BE06).  The 
only  explanation  we  have  for  these  outliers 
follows  from  the  fact  that  they  Involve  one- 
time payments  at  the  beginning  of  school  terms. 
Thus,  their  receipt  may  be  more  easily  "date- 
able"  than  other  income  sources,  and  the  single 
payment  means  that  accurate  reporting  can  never 
produce  more  between-wave  than  within-wave 
turnover.  Aside  from  these  relatively  weak 
exceptions,  however,  1t  is  clear  that  the  great 
majority  of  income  sources  display  an  exagger- 
ated turnover  rate  between  survey  waves.  The 
important  question  then  becomes:  Why  1s  this 
the  case? 

Discussion 

Although  it  is  perhaps  the  most  commonly 
assumed  explanation,  response  error  is  by  no 
means  the  only  possible  source  of  the  effects 
observed  in  this  paper,  nor  is  it  necessarily 
the  most  likely  source.  In  this  final  section, 
we  briefly  examine  four  potential  contributors 
to  greater  between-wave  than  within-wave  recip- 
iency turnover:  real  underlying  trends,  edit 
and  imputation  procedures,  person  mismatches 
in  linking  data  from  successive  survey  waves, 
and  response  error. 

Real  underlying  trends:  Since  this  investi- 
gation  is  without  the  benefit  of  external  vali- 
dating information,  we  cannot  demonstrate 
conclusively  that  the  observed  results  indicate 
"error"  as  opposed  to  reflecting  accurately 
real  underlying  trends  in  the  events  being 
measured.  Two  facts,  however,  render  the  lat- 
ter hypothesis  untenable:  1)  a  change  in  eco- 
nomic conditions  or  eligibility  rules  could 
produce  an  increase  in  recipiency  turnover  at 
a  particular  point  1n  time,  but  it  is  difficult 
to  imagine  this  happening  periodically  for  a 
wide  range  of  income  types  over  an  extended 
period  of  time;  2)  the  staggered  interviewing 
schedule  for  the  1979  ISDP  Panel  (see  Yeas  and 
Linfnger,  1981)  further  reduces  this  likeli- 
hood, since  each  calendar  month  over  the  life 
of  the  panel  served  as  the  first  reference 
month  of  a  wave  for  one  set  of  respondents,  the 
second  reference  month  for  another  set,  and  the 
third  month  for  a  third  set.  In  other  words, 
each  reference  month  in  a  survey  wave  combines 
data  from  three  calendar  months,  so  that  any 
real  change  effects  are  present  only  in  diluted 
form  in  three  reference  months. 

Edit  and  imputation  procedures:  Three  proc- 
essing procedures  possibly  contributed  to 
greater  recipiency  turnover  between  waves  than 
within  waves:  reformatting  edits  to  simplify 
and  make  consistent  various  data  fields,  imputa- 
tion for  person  nonresponse,  and  imputation 
for  item  nonresponse. 

The  only  known  problem  with  the  reformatting 
edits  is  that  they  were  carried  out  independ- 
ently for  each  wave;  incorrect  resolutions  in 
the  name  of  consistency  thus  may  have  artifi- 
cially reduced  turnover  within  waves,  while 
reporting  inconsistencies  between  waves  were 
ignored.  Another  edit  decision  which  may  have 
contributed  to  the  phenomenon  of  less  turnover 


within  waves  than  between  waves  was  the  follow- 
ing: if  at  least  one  "yes"  was  reported  for  an 
Income  type,  and/or  1f  at  least  one  monthly 
amount  was  a  valid  nonzero  amount,  then  any 
blank  monthly  recipiency  Indicators  were  set  to 
"yes"  and  any  blank  monthly  amounts  were  imputed 
using  the  average  of  the  amounts  reported  1n 
other  months.  The  obvious  effect  of  such  a 
procedure  Is  to  reduce  the  apparent  amount  of 
change  within  a  wave.  Unfortunately,  these 
edits  were  not  Identified  on  the  data  file. 
As  a  result,  the  extent  to  which  they  affected 
the  results  presented  here  is  not  known, 
although  their  combined  impact  is  likely  to  be 
small . 

Another  possible  contributor  to  the  observed 
effect  is  the  treatment  of  person  noninterviews 
within  interviewed  households.  Because  there 
were,  in  fact,  few  such  cases  (only  298  in 
Wave  1),  an  imputation  procedure  was  developed 
to  substitute  complete  person  records  for  the 
otherwise  missing  data.  The  procedure  used 
reported  demographic  data  as  matching  variables 
in  a  hot-deck  assignment.  Since  each  wave's 
data  were  processed  1nd^pendently,  it  is  highly 
unlikely  that  an  individual  who  was  a  nonrespon- 
dent  in  each  of  two  consecutive  waves  would 
receive  the  same  imputation  donor  for  both 
waves.  Consequently,  some  spurious  wave-to-wave 
change  could  occur  solely  as  an  artifact  of  the 
independent  processing. 

The  same  argument  applies  to  the  case  of  item 
nonresponse  within  a  person's  record.  The 
presence  of  valid  data  in  one  wave  and  the 
absence  of  valid  data  in  the  next  (or  vice 
versa)  suggests  possible  problems  for  between- 
wave  analyses  because  the  ISOP  imputation  system 
did  not  take  previous  (subsequent)  reporting 
patterns  into  account.  In  addition,  if  a 
respondent  did  not  provide  information  for  a 
specific  item  on  two  successive  waves  of  inter- 
viewing, it  is  likely  that  different  imputation 
donors  provided  the  missing  data  in  each  wave. 

Mismatches:  Technically,  of  course,  although 
respondents  do  report  month-to-month  turnover 
within  a  survey  wave,  it  is  incorrect  to  refer 
to  respondents'  "reports"  of  between-wave  turn- 
over. These  events  are  created  by  the  computer- 
ized process  which  links  together  the  data  for 
specific  individuals  across  survey  waves.  To 
the  extent  that  people  are  incorrectly  linked, 
a  certain  amount  of  arti factual  turnover  may 
appear  1n  the  month-pair  which  connects  the  two 
waves.  Preliminary  simulation  work  suggests 
that  mismatching  need  not  be  extensive  to 
produce  within-wave  versus  between-wave  differ- 
ences of  the  magnitudes  observed  in  Table  1. 
In  fact,  for  most  of  the  income  types  in  this 
paper,  a  mismatch  rate  of  3  percent  or  less 
would  produce  an  apparent  increase  in  turnover 
quite  comparable  to  the  observed  increase  from 
within-wave  month-pairs  to  between-wave  pairs. 

It  is  impossible  after  the  fact  to  determine 
the  impact  of  person  mismatches  on  the  estimates 
of  between-wave  turnover  in  the  1979  panel. 
Returning  to  the  discrepancy  between  the  early 
Lepkowski  and  Kalton  data  and  the  subsequent 
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refined  file,  one  Intriguing  possibility  Is 
that  although  the  former  produced  fewer  matches 
than  the  latter,  the  matches  that  were  completed 
may  have  been  relatively  error-free.  If  this 
were  the  case— that  1s,  1f  the  Michigan  group 
somehow  skimmed  off  the  definite  matches—then 
the  appearance  of  heightened  between-wave 
turnover  in  the  later  data  file  may  simply 
reflect  increased  match  errors.  Clearly,  eval- 
uating the  impact  of  match  errors  in  turnover 
estimates  from  the  SIPP  will  require  maintain- 
ing data  on  the  quality  of  the  match  for  each 
person,  perhaps  in  the  form  of  a  scale  showing 
the  number  of  variables  which  were  identical 
across  the  linked  waves. 

Response  error:  Perhaps  the  most  common 
explanation  for  the  effects  observed  1n  this 
paper  Involves  some  form  of  recall  bias.  This 
was  certainly  Czajka's  (1982)  assumption. 
Presumably,  a  gestalt-like  process  operates  1n 
response  to  imperfect  recall,  leading  respon- 
dents to  report  receipt  for  the  entire  3-month 
period  of  a  single  wave  as  having  been  more 
stable  than  it  really  was.  Such  a  process 
would  work  in  two  ways  to  produce  more  reports 
of  between-wave  than  within-wave  turnover: 
first,  by  reducing  the  number  of  w1th1n-wave 
turnover  episodes  (see  Example  1);  and  second, 
by  shifting  the  occurrence  of  turnover  episodes 
to  the  between-wave  period  (Example  2). 

wave  n      wave  n+1 


Example  1 

actual  receipt:  yes  no  yes  no  yes  no 

reported  receipt:  yes  yes  yes  no  no  no 

Example  2 

actual  receipt:  yes  yes  yes  yes  no  no 

reported  receipt:  yes  yes  yes  no  no  no 


Although  it  1s  impossible  with  the  available 
data  to  evaluate  these  notions  directly,  other 
research  has  demonstrated  effects  which  appear 
to  be  related  to  the  processes  hypothesized  to 
be  at  work  here.  Goudreau,  Oberheu,  and  Vaughan 
(1984)  report  two  results  of  interest  from  a 
survey  of  known  AFDC  recipients,  first,  those 
who  failed  to  report  receipt  were  likely  to 
have  received  AFDC  income  for  only  part  of  the 
reference  period  of  the  survey.  And  second, 
the  most  common  error  in  reporting  income 
amounts  was  the  tendency  to  report  "the  most 
recent  payment  for  all  three  months  of  the 
reference  period  when  payments  actually  varied" 
(p.  184). 

A  second,  related  response  error  possibility 
can  be  examined  using  the  present  data.  Accord- 
ing to  this  explanation,  misreports  of  the  type 
described  above,  while  perhaps  representing  a 
general  human  tendency,  are  even  more  likely 
to  occur  when  the  respondent  and  the  subject 
of  the  report  are  not  the  same  person,  and 


especially  when  different  respondents  provide 
the  data  for  two  consecutive  survey  waves. 
Table  2  summarizes  the  data  regarding  the  role 
of  proxy  response  1n  general,  and  changing 
respondents  specifically,  on  elevated  between- 
wave  turnover.  The  results  do  not  present  a 
simple  picture,  but  there  1s  no  evidence  that 
self-response  In  consecutive  waves  erases  the 
general  effect  observed  1n  this  paper.  Note 
that  with  only  one  exception,  all  differences 
In  column  (c)  are  positive;  that  1s,  between- 
wave  turnover  1s  consistently  greater  than 
w1th1n-wave  turnover  even  when  attention  is 
restricted  to  the  constant  self-response  group. 

Nor,  in  fact,  1s  there  consistent  support 
for  the  weaker  argument  that  self-response  might 
at  least  reduce  between-wave/with 1n-wave  turn- 
over discrepancies.  As  shown  1n  columns  (j) 
and  (m),  the  weight  of  the  evidence  1s  in  the 
opposite  direction.  Only  for  the  two  earned 
Income  categories  does  proxy  involvement 
strongly  and  consistently  produce  greater  dif- 
ferences as  compared  to  constant  self-response. 

Why  the  two  general  income  types  produce  such 
disparate  results  is  not  clear.  A  plausible  par- 
tial explanation— at  least  for  the  both-self/ 
m1xed-self-and-proxy  comparison — is  that  a  true 
change  1n  recipiency  for  earned  income  also 
changes  a  person's  availability  for  interview. 
For  example,  those  who  are  not  employed  may  be 
more  readily  available  to  be  interviewed  for 
self  than  those  who  are  employed.  Receipt  of 
unearned  income,  on  the  other  hand,  is  not 
associated  with  with  the  likelihood  of  finding 
a  person  at  home;  thus,  recipiency  turnover 
for  unearned  income  is  not  associated  with  a 
corresponding  change  in  response  status. 

Conclusion 

This  paper  has  demonstrated  the  existence  of 
some  data  quality  problems  1n  the  1979  Panel  of 
the  ISDP,  at  least  when  data  are  examined  from 
more  than  one  survey  wave  at  a  time.  We  have 
as  yet  no  definitive  explanation  for  these 
problems,  but  only  a  list  of  possible  causes: 
edit,  imputation,  and  processing  procedures; 
matching  difficulties;  and  response  errors. 
It  is  likely,  of  course,  that  all  contributed 
to  the  observed  effects. 

Although  modelled  in  many  ways  on  the  1979 
Panel,  the  SIPP  has  adopted  several  modifica- 
tions which  may  reduce  the  problem  of  heightened 
turnover  in  income  recipiency  between  survey 
waves.  First,  the  SIPP  questionnaire  includes 
procedures  by  which  Information  brought  forward 
from  the  previous  Interview  can  be  verified  and 
corrected,  if  necessary,  at  the  time  of  inter- 
view. The  identification  and  correction  of 
incorrect  information  was  not  systematically 
addressed  in  the  ISDP.  Second,  the  SIPP  exer- 
cises much  tighter  control  on  the  sample  than 
did  the  ISDP,  through  an  improved  control 
numbering  system,  and  improved  check-in  proce- 
dures In  Census  Regional  Offices.  These  new 
procedures  should  help  keep  mismatches  to  a 
minimum  in  linking  consecutive  survey  waves. 
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In  the  future,  as  SIPP  data  become  available 
we  will  monitor  them  closely  for  evidence  of 
the  type  of  problem  we  have  demonstrated  here. 
In  addition,  we  will  seek  to  ensure  that  data 
which  might  help  pinpoint  the  cause  of  the  prob- 
lem (for  example,  match  certainty  indicators 
and  edit  and  imputation  flags)  are  systemati- 
cally gathered  and  maintained.  We  are  also 
planning  a  more  active  program  of  investiga- 
tion—a record  check  study  matching  selected 
SIPP  income  receipt  and  amount  data  with  exist- 
ing administrative  records.  Such  a  study  will 
contribute  greatly  to  our  understanding  of  the 
quality  of  SIPP  responses,  and  will  provide 
valuable  direction  to  the  development  of  any 
ameliorative  actions  to  improve  the  quality  of 
the  SIPP. 

Technical  Note  on  Significance  Testing 
Procedures:  The  following  assumptions  guided 
procedures  for  testing  the  significance  of  the 
between-wave  versus  within-wave  difference  in 
turnover  rates: 

Suppose  five  observations  have  common  variance 
a'  and  common  correlation  p.  Then 

the  variance  of  the  average  of  four 

4<?2  +  I2pa2 


16 


(l+3p) 


and 


=  a2   +  a2/4  (l+3p)  2pa2 

=  (5/4)a2(l-p). 

In  this  illustrative  example,  the  effect  of 
positive  covariance  among  the  estimates  is  to 
reduce  the  variance  below  the  sum  of  the  vari- 
ances of  the  two  components.  For  the  tests  in 
Table  1,  the  variance  of  the  difference  was 
estimated  by 

Vardiff  =  1/N  [1/16  (pi2(l-Pl2)  +  P23U-P23) 

+  P45U-P45)  +  P56(l-P56)) 

+  P34(l-P34)3 

where  N  =  the  number  of  adult  sample  persons 
in  the  two  consecutive  waves  and  Pi(i+i)  = 
the  turnover  rate  for  month-pair  i  and  i+1 

which  ignores  all  covariances,  and  thus  is 
likely  to  be  conservative  as  compared  to  the 
illustrative  example. 

FOOTNOTES 

i/ln  fact,  if  the  analysis  indicated  any  con- 
sistent tendency,  it  was  quite  the  opposite 
of  that  proposed  by  Czajka — less  turnover  in 
the  month-pair  which  linked  the  two  survey 
waves  than  in  those  within  a  single  wave. 


2/ Some  suggestive  evidence  on  the  extent  of 
this  problem  can  be  seen  1n  the  fact  that 
about  20  percent  of  the  entries  1n  the 
Lepkowskl  and  Kalton  tables  are  of  the  "no 
match"  variety,  with  data  available  for  only 
one  of  the  two  waves.  In  fact,  1t  was  the 
frequency  of  this  outcome  which  led  Czajka  to 
believe  that  the  supposedly  "no  match"  cases 
were  actually  "no  receipt,"  since  the  code 
"occurs  too  often  to  reflect  simply  a  failure 
to  match  records  between  waves"  (Czajka, 
personal  communication,  1983). 

I/Excluded  from  the  tallies  are  the  special 
subsamples  of  persons  selected  from  lists  of 
program  participants,,  and  persons  who  were 
not  adult  household  members  during  both  of 
the  consecutive  survey  waves.  Sample  weights 
were  not  used  for  the  tallies,  and  all  analy- 
ses used  the  unweighted  survey  data. 

i/See  the  Technical  Note  regarding  the  proce- 
dures for  significance  testing. 

I/An  explanation  is  1n  order  regarding  the  last 
column  of  Table  1.  In  the  design  of  the  1979 
Panel,  a  randomly  selected  one-third  of  the 
sample  was  not  administered  a  wave  4  inter- 
view, but  skipped  dw'ectly  from  wave  3  to 
wave  5.  Thus,  the  first  two  sets  of  linked 
survey  waves— 142  and  243— contain  the  full 
respondent  sample,  sets  344  and  445  contain 
two-thirds  of  the  sample,  and  set  345  con- 
tains one-third  of  the  sample. 
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The  Student  Follow-up  Investigation  of  t 
Anthony  M.  Ron-ian,  Diane  V.  O'F 


I. 


.und 


The  Income  Survey  Development  Program  (ISDP) 
was  a  research  and  development  program 
established  in  the  mid-1970' s  by  the  Department 
of  Health,  Education  and  Welfare  (HEW)  in 
conjunction  with  the  U.S.  Census  Bureau  to 
prepare  for  the  upcoming  Survey  of  Income  ind 
Program  Participation  (SIPP).   The  SIPP  is  the 
new  survey  conducted  by  the  Census  Bureau 
designed  to  satisfy  a  wide  variety  of  data  needs 
concerning  the  economic  situation  of  persons  and 
families  living  in  the  United  States.   Data 
collection  for  the  first  SIPP  survey,  the  1984 
Panel,  began  October  1983. 

The  major  purposes  of  the  ISDP  were  the  same 
as  the  goals  set  out  for  the  SIPP:  to  improve 
current  estimates  of  income  and  income  change; 
to  extend  the  scope  and  precision  of  policy 
analyses  for  a  wide  range  of  Federal  and  State 
tax  and  social  welfare  programs,  and  to  broadly 
assess  the  economic  well-being  of  the 
population.1 

The  ISDP  conducted  four  field  tests.   All 
were  experimental  in  nature  as  different 
concepts,  procedures,  questionnaires  and  recall 
periods  were  tested.   The  1979  Research  Fanel 

effort  conducted  by  the  ISDP. 

The  1979  Panel  was  a  nationwide  household 
survey  with  a  total  sample  of  11,800  households 
drawn  from  130  Census  primary  sampling  units 
(PSUs).   Of  this  total,  approximately  9300  cases 
were  selected  from  an  area  sample  and  2500  cases 
were  drawn  from  list  samples.   Data  collection 
began  in  February  1979  and  ran  through  June 
1980.   One-third  of  the  sample  households  were 
interviewed  each  month  durins  the  interview 
period.   information  was  obtained  on  household 
composition,  labor  force  participation,  various 
sources  of  Money  and  nonmoney  income,  taxes, 
assets  and  liabilities,  and  other  related  topics. 

The  1979  Panel  included  many  controlled 
experiments  which  tested  alternatives  for  basic 
survey  design.   The  major  tests  conducted  were: 
household  versus  individual  questionnaire 
format;  self  versus  proxy  respondent  rules;  and 
3-month  versus  6-month  respondent  recall. 

As  part  of  the  research  effort  to  test 
respondent  rules,  one  unresolved  issue  concerned 
proxy  interviews  taken  for  college  students  not 
living  at  their  parents'  address.   In  order  to 
test  the  validity  of  information  collected  for 
this  type  of  proxy  interview,  an  experiment  was 
conducted  during  the  November  and  December 
interviews  of  the  1979  ISDP  Panel.   This 
experiment  was  called  the  Student  Followup 
Investigation.   This  paper  discusses  the 
objectives,  design,  and  field  procedures  used 
for  the  investigation,  and  some  preliminary 
results  of  this  experiment. 
II.  Purpose 

Respondent  rules  during  the  1979  Research 
Panel  were  to  conduct  a  personal  interview  for 
each  adult  household  member  16  years  or  older. 
If  a  self-response  interview  could  not  be 
obtained,  the  procedure  was  to  accept  a  proxy 


interview  from  another  household  member  who  wa: 
knowledgeable  about  trie  absent  person.   In  thi: 
survey,  as  in  other  Census  surveys,  students 
were  considered  as  members  of  their  parents' 
households  until  they  established  a  permanent 
residence  elsewhere.   Therefore,  the  usual 
procedure  for  students  living  away  from  home 
while  attending  school  was  to  treat  them  as 
household  members  who  were  temporarily  absent 
and  obtain  proxy  interviews  from  other  members 
of  their  parents'  household. 
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The  fourth  interview  questii 
used  during  the  1979  Panel  con 
set  of  questions  concerning  po 
educational  enrollment  and  expenses.'"  ' 
interview  seemed  especially  approoriati 
studying  the  quality  of  proxy  interviei 
students,  as  compared  to  the  student's 
interview. 

In  order  to  measure  the  accuracy  of 
information  taken  from  proxy  interviews  for 
students  living  away  from  home,  the  fourth 
interview  was  first  obtained  by  proxy  at  the 
parents'  household,  and  thor.  by  self  interview 
at  the  student's  school  residence.   This 
self-response  interview  is  referred  to  as  the 
student  followup  interview. 

There  were  two  basic  purposes  for  conducting 
the  Student  Followup  Investigation: 

DTo  obtain  the  most  complete  and  accurate 
information  possible  for  items  in  the  Education 
Expenses  section  of  the  Wave  '4  questlonnaira 
(such  as  school  enrollment,  tuition,  fees,  and 
living  expenses),  and 

2)To  determine  whether  proxy  respondents  at 
the  sampled  address  are  able  to  provide  reliable 
information  on  labor  force  participation, 
income,  education  expenses  and  enrollment  for 
students  living  away  from  heme.   This  experiment 
icted  by  comparing  the  information 
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The  fourth  intervi. 
November  and  December 
students  living  away  I 
while  attending  school.   Only  students  who  were 
actually  btaying  at  their  school  residence 
(either  a  dormitory,  fraternity  house, 
apartment,  etc.)  curing  the  time  of  the  November 
or  December  interview  were  eligible  for  followup. 
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Census  regional  offices  were  responsible  for 
the  control  and  assignment  of  the  student 
followup  cases.   The  rules  for  assigning  ch« 
cases  were  essentially  the  same  as  the  ISDP 
rules  for  movers.   If  the  student's  school 
address  was  within  50  miles  of  an  ISDP  PSU,  the 
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the  student  for  an  interv 

n  interviewer  visited 
iew.   Regional  office 

were  instructed  to  always 
interviewer  for  the  stude 

employ  a  different 
nt's  interview  in  ord 

to  eliminate  any  intervie 
Additionally,  interviewer 
accept  only  self-response 
student's  school  address; 
from  roommates  or  friends 

wer  bias. 

s  were  instructed  to 
interviews  at  the 
no  proxy  responses 
were  allowed. 

IV.  Field  Results 

The  analytic  universe  for  the  study  was  the 
totality  of  students  in  the  4th  Wave  of  the  1979 
Panel  who  usually  lived  away  from  home  and  were 
attending  post  secondary  schools.2   There 
were  443  such  students  identified.   Of  these, 
117  (26.4  percent)  were  not  eligible  for 
interview  since  the  school  residence  was  more 
than  50  miles  from  an  ISDP  P3U  and  54  (12.2 
percent)  were  not  eligible  because  the  student 
was  staying  at  home  during  the  time  of  the  4th 
Wave  interview. 

Of  the  272  cases  assigned,  202  student 
followup  interviews  were  obtained  yielding  a 
response  rate  of  7A.3  percent.   Of  the  70 
noninterviews,  6  were  cases  in  which  the  parents 
refused  permission  for  the  interviewer  to 
contact  the  student. 

The  major  reason  for  the  noninterviews  was 
that  many  students  were  not  staying  at  their 
school  address  (because  of  Thanksgiving, 
Christmas  and  semester  breaks)  by  the  time  the 
interviewer  received  the  followup  assignment. 
Although  interviewers  were  allowed  until  the 
first  week  of  December  to  obtain  the  followup 
interviews  for  students  identified  in  November 
and  until  the  second  week  of  January  for 
students  identified  in  December,  many  students 
remained  on  some  type  of  break  later  into 
December  and  January.   This  proved  to  be  an 
inappropriate  time  of  year  for  conducting 
interviews  with  students  at  their  school 
address.   However,  in  the  case  of  the  1979 
Panel,  we  overlooked  this  factor  in  the  survey- 
design  in  order  to  conduct  the  experiment  in 
conjunction  with  the  Education  Expenses 
questions,  which  were  set  beforehand  for  the 
Wave  4  interview. 

A  recommendation  for  future  studies  involving 
students  interviewed  at  their  school  address  is 
to  obtain  the  school  address  in  a  previous 
wave's  interview.   This  would  allow  interviewers 
more  time  to  contact  the  student. 
V.   Preliminary  Findings 
A.  Data  Set  Creation 

The  first  task  in  analyzing  these  data  was 
the  creation  of  a  data  set  of  matched  responses 
from  the  student  followup  questionnaire  and  the 
proxy  questionnaire  administered  during  Wave  4 
cf  the  ISDP.   During  the  matching  process,  35 
students  (17.3  percent)  could  not  be  matched  to 
the  Wave  4  ISDP  File.   Attempts  to  reconcile  the 
mismatches  were  unsuccessful.   In  all  but  one 
instance,  the  most  basic  identifiers  for  these 
35  students  did  not  exist  on  rhe  Wave  4  ISDP 
File.   Due  to  the  time  elapsed  from  the 
initiation  of  this  followup  study  to  the 
creation  of  the  analysis  data  set,  it  has  been 
extremely  difficult  to  find  out  why  these 


be  aware  of  these  problems  and  pr< 
them.  Omitting  the  35  mismatches 
data  set  of  167  matched  responses 
are  analyzed  ii 
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is  report.   In  all  but  two 
instances,  the  variables  analyzed  are  direct 
responses  to  questions  on  the  ISDP  form  (i.e., 
they  are  not  in  any  way  computed) .   The  only 
exceptions  are  "usual  hours  worked  per  week  at 
all  jobs"  and  "total  pay  before  deductions  from 
all  jobs  last  month".   These  two  variables  are 
computed  by  summing  the  response  from  each 
reported  job. 

B.  Relationship  of  Student  to  Proxy  Respondent 
The  relationship  of  the  student  to  the 

respondent  serving  as  his/her  proxy  can  be 
determined  in  most  cases  through  their 
relationships  to  the  household  reference 
person.   The  reference  person  is  that  household 
member  who  is  stated  as  owning  or  renting  the 
residence.   Table  1  indicates  that  in  84.4 
percent  of  the  cases,  the  proxy  was  a  parent  of 
the  student.   This  follows  the  expected  pattern. 

C.  Wage  and  Salary  Comparisons 

The  ISDP  questionnaire  was  divided  into 
several  sections.   One  section  was  designed  to 
identify  receipt  of  income  types  while  other 
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series  of  wage  and  salary  questions  if  they 
indicated  in  the  recipiency  section  of  the 
questionnaire  that  they  worked  at  a  job  or 
business.   Ore  wage  and  salary  record  was 
created  containing  responses  to  the  set  of  wage 
and  salary  questions  fc  each  job  named.   Thus, 
if  a  student  had  only  one  employer,  a  wage  and 
salary  record  should  have  been  created  with  the 
student's  responses  while  another  wage  and 
salary  record  should  have  been  created  with  the 
proxy's  responses.   The  reference  period  used  in 
the  ISDP  was  the  previous  3  months,  but  tiie  wage 
and  salary  records  were  created  on  a  job  basis. 
Therefore,  a  reported  job  could  have  been  held 
at  any  time  during  the  3-:nonth  reference 
period.   In  examining  the  167  matched  cases  of 
self  and  proxy  responses,  the  following 
breakdown  of  wages  and  salaries  was  observed: 

83  had  at  least  one  self  and  one  proxy  record 

53  had  neither  a  self  nor  a  proxy  record 

27  had  a  self  but  no  proxy  record 
4  had  a  proxy  in*-,  no  self  record 

If  one  assumes  that  the  self  response  is 
correct,  then  the  proxy  failed  to  identify  a  job 
held  by  the  student  in  27  cases  ^24. 5  percent). 
This  appears  to  be  rather  substantial  and 
indicates  a  potential  source  of  underreporting 
of  wages  and  salaries  with  proxy  response.   The 
4  cases  in  which  a  proxy  record  exists  while  no 
self  record  exists  may  be  interpreted  as  a 
potential  source  of  mi  sreport  ir.g  wages  and 
salaries  under  proxy  response. 

In  attempting  to  analyze  particular  wage  and 
salary  questions  of  interest,  several  conditions 
must  be  kept  in  mind.   While  83  matched  cases 
exist  with  both  a  self  and  pro::y  wage  and  salary 
record,  the  number  of  cases  available  for  making 
comparisons  for  any  particular  question  may  be 
less.   There  are  two  primary  reasons  cor  this: 
1)  one  interview  may  have  proceeded  in  a  fashion 
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patterns  within  the  questic 
though  the  question  of  interest  may  have  been 
asked  during  both  interviews,  one  may  have 
resulted  in  a  valued  response3  while  the  other 
did  not.   Valued  responses  are  important  in 
evaluating  the  quality  of  data  obtained  in  a 
survey.   They  indicate  both  knowledge  by  the 
respondent  of  the  investigated  subject  matter 
and  willingness  to  cooperate  in  the  survey. 

With  this  in  mind,  the  percentages  of  coded 
responses  which  were  valued  (i.e.,  given  that  a 
question  was  asked,  the  number  of  times  it 
resulted  in  a  valued  response)  are  presented  in 
Table  2.   It  is  seen  that  for  several  wage  and 
salary  questions,  it  is  more  likely  that  a  self 
respondent  will  give  a  valued  response.   This  is 
particularly  evident  with  the  "usual  hours 
worked  per  week"  and  "hourly  rate  of  pay" 
variables.   In  all  but  one  instance,  when  a 
valued  response  was  not  given,  a  "don't  know" 
was  the  recorded  response. 

Table  2  also  presents  the  mean  value  of  self 
responses  for  seven  wage  and  salary  variables 
for  three  particular  categories: 

1)  the  proxy  could  not  identify  that  the 
student  had  a  job  (i.e.,  nc  proxy  wage  and 
salary  record  existed  but  a  self  wage  and  salary 
record  did  exist; 

2)  the  self  response  was  valued  while  the 
proxy  response  was  not  (e.g.,  the  proxy  most 
likely  responded  "don't  know"),  and 

3)  both  self  and  proxy  responses  were  valued. 
This  table  demonstrates  that  a  pattern 

appears  to  exist  in  which  proxies  best  identify 
jobs  at  which  strdonts  earn  the  most  mon-ay  or 
work  the  most  hours.   The  smaller  the  earnings 
or  hours  worked,  the  more  likely  the  proxy  will 
either  not  be  able  to  identify  the  job  or  not  be 
able  to  answer  detailed  questions  about  the  job. 

Several  points  should  be  noted  concerning 
Table  2.   The  usual  hours  worked  per  week  may 
seem  rather  high  for  student  jobs.   This  is  due 
to  the  reference  period  for  these  questions 
extending  back  into  the  summer  months. 
Therefore,  summer  jobs  in  which  the  student  may 
have  worked  40  or  more  hours  per  week  will  be 
included  in  these  summaries.  This  also  explains 
the  decreases  in  total  monthly  pay  from  three 
months  ago  to  last  month.   Also,  it  is 
impossible  to  compute  total  monthly  pay  by  using 
the  usual  hours  worked  per  week  and  regular 
hourly  rate  of  pay.   This  is  because  the  values 
presented  in  these  tables  are  means  and  concern 
the  student's  primary  job.   One  student's 
primary  job  may  have  been  three  months  ago  while 
another's  may  have  been  last  month. 

The  final  table,  Table  3,  presents 
comparisons  of  the  self  and  proxy  valued 
responses.   It  should  be  noted  that  the 
estimated  variances  used  in  computing  these 
confidence  intervals  do  not  take  into  account 
any  sample  design  effects.   The  reason  is  that 
this  analysis  is  considered  preliminary  and  will 
be  used  to  decide  if  a  more  lengthy  detailed 
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a  small  degree  of  accuracy  to  results  from  this 
study,  but  it  should  be  noted  for  future 
studies,  that  increased  emphasis  on  obtaining 
responses  from  all  sample  students  and  their 
proxies  would  greatly  enhance  the  accuracy  of 
results.   Of  the  seven  wage  and  salary  variables 
analyzed,  two  showed  a  significant  difference  at 
the  .05  level.   These  were  "usual  hours  worked 
per  week"  and  "regular  hourly  rate  of  pay",  both 
for  the  student's  primary  job.   In  both 
instances,  the  proxy  gave  the  larger  valued  mean 
response.   It  is  interesting  to  note  that  for 
"usual  hours  worked  per  week  at  all  "jobs",  the 
mean  self  and  proxy  responses  are  not 
significantly  different.   This  raises  the 
question  of  the  proxy  and  student  possibly 
identifying  different  jobs  as  being  primary. 
D.   Education  Expenditure  Comparisons 

All  167  matched  c.i3es  had  both  a  self  and  a 
proxy  educacion  expenditures  record,  but  61  of 
these  records  were  unavailable  for  this 
preliminary  analysis.   This  was  due  to  a  flaw 
discovered  in  the  manner  in  which  the  Wave  4 
ISDP  data  were  processed.   Only  rekeying  of  the 
questionnaires  could  retreive  the  data  and  this 
was  deemed  unwarranted  at  the  present  time. 
Therefore,  106  matched  records  were  available 
for  analyzing  education  expenditures. 

Table  2  again  presents  the  percentage  of 
responses  which  were  valued.   It  is  obvious  that 
a  valued  response  is  much  more  likely  from  a 
self  respondent  than  a  proxy  respondent.  This 
seems  understandable  for  all  variables  except 
"amount  paid  by  family  on  tuition  ar.d  fees" 
since  the  other  variables  involve  expenditures 
most  likely  handled  directly  by  the  student.   In 
every  instance  that  a  valued  response  was  not 
given,  "don't  know"  was  the  recorded  response. 

Table  2  displays  the  mean  value  of  self 
responses  both  when  the  proxy  has  a  valued 
response  and  also  when  the  proxy  response  is 
"don't  know".   Three  of  the  four  variables 
considered  do  not  appear  to  differ  substantially 
between  these  two  categories.   Only  the  "amount 
paid  by  family  on  tuition  and  fees"  exhibits  a 
rather  large  difference  with  the  mean  ;elf 
response  being  greater  if  the  proxy  has  a  valued 
response.   This  is  consistent  with  the  wage  and 
salary  results  in  that  the  more  expensive  the 
tuition,  the  more  the  proxy  is  likely  to  know 
about  the  amount.   It  may  also  help  explain  why 
so  many  "don't  knows"  were  given  by  proxies  in 
response  to  this  question.   Perhaps  when  the 
amount  of  tuition  is  low,  the  student  is  more 
likely  to  be  directly  involved  in  its 
payment ( e . g . ,  the  student  may  pay  the  tuition 
from  support  supplied  by  the  parent). 

Table  3  again  presents  results  of  comparisons 
of  self  and  proxy  valued  responses.   Two  of  the 
four  variables  showed  a  significant  difference 
at  the  .05  level.   They  were  "academic  credit 
hcurs  taken  this  term"  and  "cost  of  course 
materials".   In  both  instances,  the  mean  proxy 
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received.  These  were:  Basic  Educational 
Opportunity  Grants  (31  cases)  and  Government 
Scholarships,  Fellowships,  Etc. (11  cases).  The 
results  of  comparisons  of  self  and  proxy  valued 
responses  are  shown  in  Table  3.  No  significant 
differences  in  mean  amount  received  were  found 
for  any  of  the  assistar.ee  variables. 

The  last  area  investigated  was  receipt  of 
interest  income.  Reporting  of  interest  was 
handled  in  the  ISDP  questionnaire  in  the  same 
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and  proxy  report 
30  cases  had  neither  a  self  nor  proxy  report 
27  cases  had  a  self  but  no  proxy  report 
6  cases  had  a  proxy  but  no  self  report 

Assuming  the  self  response  is  correct,  the 
proxy  failed  to  identify  that  the  student  would 
have  interest  income  in  27  cases  (20.6 
percent).   Although  this  appears  to  be  a  large 
problem,  interest  income  is  poorly  reported  for 
all  people.   For  example,  in  the  104  cases  in 
which  both  the  self  and  proxy  respondent 
reported  receipt  of  interest  earned  on  the 
student's  own  accounts,  61.0  psreent  of  the 
coded  self  responses  were  "don't  lenow"  while 
81.5  percent  of  the  coded  proxy  responses  were 
"don't  know".   Considering  the  question  on 
interest  earned  on  the  student's  shared 
accounts,  69.4  percent  of  the  coded  self 
responses  and  80.0  percent  of  the  coded  proxy 
responses  were  "don't  know".   Obviously,  it 
appears  that  the  quality  of  interest  data  for 
students  is  suspect  regardless  of  whether  a  self 
or  proxy  interview  is  conducted. 
VI.  Conclusions 

The  aim  of  this  preliminary  analysis  was  to 
examine  the  self  and  proxy  student  data  in  order 
to  decide  if  a  more  extensive  investigation 
(e.g.,  effects  of  accepting  proxy  responses  on 
overall  survey  estimates)  seemed  warranted.   Any 
inferences  drawn  from  these  data  should  keep  in 
mind  that  the  estimated  variances  did  not 
reflect  any  sample  design  effects  and  that  the 
size  of  the  data  set  is  quite  small.   Indeed, 
most  comparisons  were  based  on  less  than  100 
observations.   Still,  this  study  is  unique  and 
although  somewhat  flawed  in  administration  and 
implementation,  it  is  possible  to  make  certain 
general  remarks.   When  valued  responses  are 
available  from  both  the  self  and  proxy 
interviews,  the  quality  of  the  proxy  responses 
appears  to  be  generally  quite  gooa. 


Substantially  more  data  would  be  needed  to 
derive  better  estimates  of  the  difference 
between  self  and  proxy  response  and  to  narrow 
the  confidence  intervals  around  these  estimates. 

A  problem  that  does  appear  to  exist  is  in 
obtaining  a  valued  proxy  response.   Quite  often, 
a  proxy  cannot  identify  a  particular  source  of 
student  income  (e.g.,  wages  and  salaries)  and 
even  if  they  can  identify  it,  they  are  more 
likely  to  respond  "don't  know"  to  the 
particulars  about  that  source.   A  trend  does 
seem  to  exist  that  the  larger  the  income  or 
expense,  the  better  the  proxy  response  becomes. 
Still,  this  implies  that  by  using  proxy 
responses,  the  lower  range  of  income  or  expense 
amounts  are  more  likely  missed. 

Finally,  the  main  issues  involved  in 
interviewing  students  away  from  home  are  the 
impact  of  accepting  proxies  on  overall  survey 
estimates  and  the  differential  costs  involved  in 
obtaining  self  responses.   Since  no  cost  data 
are  available  from  this  study,  an  estimate  of 
the  additional  amount  required  in  obtaining  self 
responses  cannot  be  computed.   It  may  be 
possible  to  make  some  very  general  comments 
about  the  potential  impact  of  accepting  proxies 
on  overall  survey  estimates.   Students  living 
away  from  home  make  up  less  than  3  percent  of 
the  overall  ISDP  sample.   With  this  in  mind  and 
the  fact  that  results  from  this  study  indicate 
that  proxies  are  more  likely  to  miss  only  the 
smaller  expense  and  income  amounts,  it  may 
appear  unlikely  that  overall  survey  estimates 
will  be  strongly  affected.   Still,  the 
limitations  of  the  sample  involved  in  this  study 
must  be  considered  in  any  statement  of  results. 
For  instance,  students  living  more  than  50  miles 
from  an  ISDP  PSU  were  omitted  from 
consideration.   Also,  problems  were  encountered 
in  matching  students  to  proxies  and  in  losing 
some  survey  data  due  to  a  processing  flaw.   The 
effect  that  these  students  could  have  had  on 
results  from  this  study  is  unknown.   In 
concluding,  further  detailed  investigation  of 
this  particular  data  set  is  not  recommended  due 
to  the  limitations  in  the  size  and  composition 
of  the  sample.   Future  study  may  lead  to 
stronger  results  but  based  upon  this  preliminary 
investigation,  it  is  recommended  that  while  the 
self-proxy  student  issue  should  not  be 
forgotten,  it  should  not  occupy  a  high  place  on 
the  SIPP  research  agenda. 

FOOTNOTES 

1  Research  Triangle  Institute.  1983.   The  1979 
ISDP  Research  Panle  Documentation.   National 
Technical  Information  Service,  Washington,  D.C. 

2  Since  Wave  4  of  the  1979  Panel  was 
administered  over  a  two  month  period,  only 
two-thirds  of  the  11,800  household  sample  was 
interviewed,  making  the  Wave  4  sample  size 
approximately  8,100  households. 

3  Throughout  this  report,  the  term  valued 
response  is  used  to  imply  any  response  with  a 
legitimate  value  for  the  question  asked.   Valued 
responses  do  not  include  refusals,  don't  knows, 
or  responses  whose  value  is  considered  cut  of 
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Table  2:   Results  for  Wage  &  Salary  and  Education  Expend!* 
%   of  coded  cases      Mean  value  of  ; 


Jlf  responses  when:2 


Wage  and  Salary 

Usual  hours  worked  per 
week,  at  primary  job 


which  wer 

i   valued:! 

proxy  could 

proxy  did 

both  self 

not  identify 

not  give 

and  proxy 

self 

proxy 

that  student 

a  valued 

had  valued 

espor.se 

response 

had  a  -job 

resDonse 

responses 

93.7% 

76.4% 

22.30  hrs 

21.11  hrs 

35.60  hrs 

(n=76) 

(n=55) 

(n-20) 

(n=38) 

<n=37) 

100.0 

73.3 

$3.17/hr 

$3.46/hr 

$3.39/hr 

(n=66) 

(n=75) 

(n=16) 

(n=18) 

(n=48) 

100.0 

100.0 

$111.81 

$246.48 

$378.42 

(n=60) 

(n=51) 

(n=16) 

(n=27) 

(n=33) 

100.0 

100.0 

$32.19 

$97.82 

$138.33 

(n-,61) 

(n=50) 

(n=16) 

(n=28) 

(n=33) 

100.0 

98.0 

$37.25 

$74.83 

$100.00 

(n=61) 

(n=50; 

(n=16) 

(n=29) 

(n-32) 

_ 

_ 

25.76  hrs 

24.75  hrs 

41.37  hrs 

(n=21) 

<n=37) 

(n=38) 

" 
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$94.47 
(n=17) 

$131. 1C 
(n=63) 
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(n=51) 
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Variable 

Wage  and  Salary 

usual  hours  worked  per 
week  at  primary  job 
regular  hourly  rate  of  pay 

at  primary  job 
total  pay  before  deductions 


from  primary 
total  pay  before  deduc 

from  primary  job  2  i 
total  pay  before  deduc 

from  primary  job  la 
usual  hours  worked  per 

at  all  jobs 
total  pay  before  deduc' 

from  all  jobs  last  i 
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ob  3  months 


35.60  hrs 

40.57  hrs 

-4.97  hrs* 

-9.03  hrs 

-0.91  h 

$3.39/hr 

$3.54/hr 

-$.15/hr.* 

-$.29/hr 

-$.01/h 

go   $378. 42 

$336.09 

$42.33 

-$56.80 

$141.46 

So   $138.33 

$121.52 

$16.81 

-$30.79 

$64.41 

$100.00 

$106.56 

-$  6.56 

-$22.56 

$   9.44 

A1.37  hrs 

40.55  hrs 

0.82  hrs 

-3.55  hrs 

5.19  h 
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$.15 
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$21.35 
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-1.93  hrs 

-0.4  7  h 

$1004.10 

$1157.63 

-$153.53 

-$591.87 

$284.31 

$98.75 
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-$22.09* 
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-$16.87 

$36.39 

3E0G  l 

BEOG  i 

2  r 

BEOG  ; 

la: 


Government  si 
Government  s< 
Government  si 

implies  diffei 

These  limits 
design  effect 


$299.61 

$381 

74 

-$82.13 

-$281.99 

$117. 

$194.48 

$301 

24 

-106.76 

-$302.28 

$  83. 

$  50.57 
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71 
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THE  ISDP  1979  RESEARCH  PANEL  AS  A  METHODOLOGICAL  SURVEY:  IMPLICATIONS  FOR  SUBSTANTIVE  ANALYSIS 


Richard  A.  Kulka,  Re 
The  1979  Research  Panel  was  the  third  of 
three  major  field  tests  conducted  by  the  Income 
Survey  Development  Program  (ISDP)  to  fulfill 
its  mandate  to  examine  and  resolve  the  tech- 
nical and  operational  problems  involved  in  the 
design  and  implementation  of  a  survey  mechanism 
that  would  meet  the  needs  for  improved  data  on 
income,  assets,  and  program  participation  for 
program  and  policy  analysis.  As  the  most 
proximal  and  realistic  pilot  or  prototype  for 
the  Survey  of  Income  and  Program  Participation 
(SIPP),  the  1979  Research  Panel  employed  a 
longitudinal  panel  design,  whereby  persons  at 
sample  addresses  were  contacted  early  in  the 
calendar  year  and  recontacted  at  regular  inter- 
vals (usually  every  three  months)  and  asked 
about  their  income  and  other  characteristics 
for  the  preceding  few  months.  In  this  way,  a 
highly  detailed  record  was  built  up  for  each 
person  for  the  entire  calendar  year  according 
to  the  schedule  presented  in  Table  1.  In 
addition,  since  less  time  was  required  to 
update  basic  information  after  the  initial 
interview,  time  was  available  in  later  waves  of 
interviewing  to  ask  additional  questions  or 
questionnaire  modules  on:  (a)  topics  of  inter- 
est that  were  stable  enough  not  to  require 
updating  on  each  visit;  and  (b)  emerging  issues 
of  special  interest  to  particular  agencies  or 
programs.  As  a  result  of  the  diligent  applica- 
tion of  this  modular  approach,  the  fully-imple- 
mented 1979  Research  Panel  produced  an  over- 
whelming array  of  data  suitable  for  longitudi- 
nal analysis  and  a  corresponding  wealth  of  de- 
tailed socioeconomic  data  on  more  specialized 
issues   suitable   for   complex   cross-sectional 

Table  1 
1979  Research  Panel  Interview  Months  and  Reference  Period 


Wave    Group       M 

Three 

1        1      Feb 

1979 

Dec. 

!  llll 

Jan 
Feb 

28 ; 

979 

3      Apr 

2  1      May 

2  Jun 

3  Jul 

3  1       Aug 

2  Sep 

3  Oct 

2      Dec 
5       3      Jan 

1  Feb 

2  Mar 

1979 
1979 
1979 
1979 
1979 
1979 

1979 
1980 
1980 

Feb. 
Apr. 

Jul. 
Sep. 

Dec. 

1,  1979 
,  1979 
,  1979 

1,  1979 

l]    1979 

Apr 

31, 
31, 

31, 
30, 

979 

979 

980 

As  the  largest,  mo 
plete"  of  the  ISDP 
search  Panel  is  of  sufficient  design,  size,  and 
interest  to  occupy  social  and  policy  analysts 
for  many  years  to  come,  and  its  use  for  such 
substantive  and  policy  research  has  been 
actively  promoted  (e.g.,  David,  1983;  Kasprzyk, 
1983a).  It  is  important  to  keep  in  mind, 
however,  that  the  fundamental  purpose  of  the 
ISDP  field  tests  —  including  the  "state-of-the- 
art"  1979  Research  Panel--was  methodological: 


ch  Triangle  Institute 

to  serve  as  a  flexible  vehicle  for  conducting 
field  experiments  and  feasibility  tests  to 
evaluate  the  effectiveness  of  alternative 
design  features  and  data  collection  strategies 
(Yeas  and  Lininger,  1981).  Hence,  the  1979 
Research  Panel  is  replete  with  such  methodolog- 
ical assessments,  and  potential  users  of  these 
data  should  be  aware  of  their  nature  for  at 
least  two  reasons.  First,  since  most  of  these 
tests  and  experiments  have  not  been  fully 
analyzed  or  evaluated  (some  not  at  all),  the 
potential  for  methodological  as  well  as  sub- 
stantive analysis  of  these  data  is  very  great 
(cf.  David,  1983).  Second,  since  these  method- 
ological assessments  are  an  integral  part  of 
the  total  survey  design,  they  bear  directly  in 
some  cases  on  the  likely  quality  of  the  data 
for  substantive  analysis,  including:  (a)  poten- 
tial differences  in  the  nature  or  quality  of 
data  collected  under  different  experimental 
variations;  (b)  the  confidence  one  can  have  in 
the  accuracy  of  certain  data;  and  (c)  the 
extent  to  which  data  collected  under  different 
procedures  can  legitimately  be  conceptually 
merged  for  certain  analyses  rather  than  ana- 
lyzed separately. 

OVERVIEW  OF  TESTS  AND  EXPERIMENTS 

With  the  latter  especially  in  mind,  I  have 
summarized  in  Table  2  the  full  range  of  ex- 
plicit methodological  research  embedded  in  the 
1979  Research  Panel.  Although  virtually  every 
aspect  of  the  1979  Research  Panel  was  subject 
to  methodological  scrutiny  and  evaluation,  in 
particular,  five  formal,  controlled  experi- 
mental comparisons  of  alternative  design  or 
data  collection  strategies  were  systematically 
incorporated  in  the  survey,  and  seven  other 
procedures  were  explicitly  included  to  provide 
a  focused  nonexperimental  assessment  of  their 
feasibility  for  implementation  in  the  SIPP. 
For  each  of  these  12  tests  and  experiments  a 
brief  description  is  provided,  along  with  its 
basic  design,  a  capsule  summary  of  results,  and 
a  note  on  its  possible  implications  (if  any) 
for  substantive  analysis. 
Controlled  Experiments 

The  first  of  five  formal  experiments  included 
in  the  1979  Research  Panel  compared  two  alter- 
native questionnaire  formats  for  measuring 
income  recipiency,  one  using  a  "household 
screening  approach"  to  determine  receipt  of 
various  kinds  of  income  and  the  other  a  more 
conventional  person-by-person  "individual" 
approach.  It  was  hoped  that  the  former  ap- 
proach would  reduce  the  time  needed  to  adminis- 
ter the  questionnaire  without  a  corresponding 
reduction  in  data  quality.  Preliminary  analy- 
ses by  Coder  (1980)  and  Kaluzny  (1981)  indi- 
cated few  differences  between  the  two  approach- 
es in  estimates  of  income  recipiency  rates  by 
type,  and  only  a  slightly  higher  incidence  of 
"don't  knows"  and  "refusals"  under  the  house- 
hold screening  approach,  but  the  average 
savings   per   household   was   only   about   five 
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any  such  potential  "form 
n  the  "household  screen- 


ing approach"  compelled  a  higher  level  of 
"proxy"  rather  than  self-reporting  for  income 
recipiency  than  under  the  individual  approach 
(even  under  standard  respondent  rules),  and 
because  a  few  questions  on  income  receipt  that 
could  not  be  covered  on  a  household  basis 
(e.g.,  verification  of  labor  force,  retirement, 
and  disability  status  and  Medicare  and  Medicaid 
coverage),  and  all  questions  concerning  amounts 
of  income  received,  were  asked  on  a  person-by- 
person  basis  even  under  the  household  screening 
approach,  there  is  a  distinct  possibility  that 
different  types  of  income  data  may  have  been 
reported  differentially  under  that  approach  but 
not  when  the  individual  approach  was  used. 
Moreover,  although  these  alternative  forms  were 
used  only  during  the  first  wave  of  interview- 
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"standard"  respondent 
interviews  were  accepted  for  absent  persons 
from  other  household  members  when  convenient. 
A  number  of  different  analyses  have  been  con- 
ducted to  date  in  an  effort  to  study  the  ef- 
fects of  these  proxy  respondent  rules  and 
self-respondent  rules  on  data  quality,  non- 
interview  rates,  and  costs  of  data  collection 
(e.g.,  Coder,  1980;  Kaluzny,  1981,  1982;  Kulka, 
1983).  In  general,  while  the  use  of  self- 
response  rules  results  in  approximately  20 
percent  more  self-response  (85  vs.  65  percent) 
and  4-6  percent  higher  interviewing  costs  than 
standard  respondent  rules,  results  on  nonre- 
sponse  and  data  quality  are  mixed.  While  the 
proxy  treatment  had  a  positive  effect  on  house- 
hold and  person  interview  rates,  self-respon- 
dent rules  apparently  resulted  in  somewhat 
better  data  (as  implied,  for  example,  by  the 
greater  use  of  records,  lower  item  nonresponse 
for  certain  key  items,  less  rounding,  and  less 
variance  in  non-zero  amounts),  although  some  of 
these  effects  appeared  to  be  somewhat  smaller 
by  Wave  2  (Kaluzny,  1982). 

Unlike  the  "forms"  experiment,  the  self-proxy 
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influences  on  data  quality  or  increases  in  tl 
variance  of  key  variables  due  to  this  exper: 
mental  factor  are  likely  to  be  found  throughoi 
the  database.  Yet,  with  few  exceptions,  tl 
longitudinal  implications  of  these  alternate 
respondent  rules  have  not  yet  been  invest: 


145 


gated.  Changes  in  the  proportions  of  proxy 
respondents  or  in  the  characteristics  of  proxy 
vs.  self-responders  under  the  two  conditions 
may  vary  over  time,  thereby  confounding  some- 
what longitudinal  analyses  of  variables  espe- 
cially sensitive  to  respondent  rules.  More 
generally,  suppose  that  a  comparison  of  the 
1979  Research  Panel  to  other  survey  data  sug- 
gested that  the  former  provided  more  accurate 
data  relative  to  an  independent  source  of 
information.  Suppose  further  that  this  im- 
provement was  directly  attributable  to  the  use 
of  a  maximum  self-response  rule  (i.e.,  under 
regular  proxy  rules  estimates  were  similar). 
Without  making  such  an  assessment,  however,  one 
might  assume  that  in  general  the  ISDP  design 
results  in  better  data  and  generalize  this 
assumption  to  the  SIPP,  an  erroneous  presump- 
tion, of  course,  unless  the  SIPP  were  to  employ 
rules  maximizing  self-response.  Moreover,  it 
is  not  difficult  to  see  how  such  a  methods 
"artifact"  might  similarly  influence  important 
relationships  among  variables  of  major  policy 

The  third  experiment  compared  property  or 
asset  income  amounts  reported  using  a  three- 
month  reference  period  with  that  reported  for  a 
six-month  recall  period.  The  basic  objective 
of  this  experiment  was  to  determine  if  infor- 

lected  every  six  months  would  be  as  accurate  as 
that  collected  quarterly.  Results  of  this 
experiment  would  provide  evidence  on  the  magni- 
tude of  loss  with  the  longer  recall  period  (a 
critical  ingredient  in  justifying  the  current 
four-month  recall  design  for  the  SIPP),  but 
very  little  of  this  analysis  has  been  done  to 
date  (cf.  Czajka,  1983). 

The  very  reason  for  conducting  this  experi- 
ment, however,  implies  increased  variation  in 
reported  amounts  of  asset  or  property  income 
due  to  differences  in  length  of  recall  period. 
Since  the  preceding  three  months  are  reported 
with  an  identical  recall  period  by  both  groups 
every  other  wave,  the  influence  is  not  constant 
for  all  months.  Thus,  substantive  analyses  of 
a  "common"  three-month  period  may  yield  differ- 
ent results  than  that  of  a  similar  period  where 
recall  is  three  months  longer  for  one  group 
than  the  other.  Similarly,  quarter-to-quarter 
variation  in  asset  income  reporting  may  be 
greater  within  the  six-month  reporting  group 
than  within  the  three-month  subsample.  More- 
over, since  asset  income  recipiency  is  reported 
quarterly,  the  expected  influences  would  likely 
be  on  asset  income  amounts,  but  a  longer  re- 
porting period  for  "amounts"  could  also  have  an 
indirect  adverse  effect  on  reports  of  recipi- 
ency as  well.  In  addition,  if  a  recall  effect 
of  either  type  is  present,  such  effects  may 
either  dissipate  or  increase  in  magnitude  over 
the  life  of  the  panel  (through  Wave  5). 

A  fourth  experiment,  afforded  by  the  use  of  a 
"staggered"  interview  design  in  which  each 
quarter's  interviewing  was  spread  over  three 
months  with  a  variable  three-month  reference 
period  (see  Table  1),  provided  for  a  systematic 
comparison  of  income  and  other  information 
reported  for  several  months  during  the  year 
using  a  one-,  two-,  or  three-month  reference 
period.   Although  the  staggered  design  was  not 


adopted  for  this  reason,  it  provides  a  "na- 
tural" experimental  design  for  the  assessment 
of  potential  monthly  recall  bias  by  length  of 
reporting  period  for  virtually  all  income  types 
and  a  wide  variety  of  other  variables.  To 
date,  however,  only  preliminary  analyses  of 
this  natural  recall  experiment  have  been  con- 
ducted (Kaluzny,  1981,  1982;  Czajka,  1982), 
none  of  which  have  provided  consistent  evidence 
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implications  of  this  month- 
ly interview  design  for  substantive  analyses 
are  considerable.  On  the  positive  side,  the 
staggered  interview  procedure  provides  an 
ongoing  measure  of  monthly  recall  bias,  and  to 
the  extent  that  such  bias  exists,  the  varied 
recall  period  tends  to  minimize  its  effect 
(relative  to  more  typical  quarterly  interview- 
ing) when  making  comparisons  of  monthly 
changes,  since  income  and  other  monthly  data 
were  always  collected  with  the  same  average 
length  of  recall.  On  the  other  hand,  the 
staggered  approach  introduces  some  substantial 
problems  with  regard  to  missing  data  and  re- 
sponse  variance   for   monthly 
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are  made  with  higher  vai 
gered  approach  requires  that  calendar  quarter 
estimates  for  two  thirds  of  the  sample  be 
derived  from  data  collected  in  two  separate 
interviews,  resulting  in  greater  levels  of 
missing  data,  linkage  problems,  and  increased 
month-to-month  variation  within  quarters.  For 
example,  recent  analyses  of  data  from  the  1979 
Research  Panel  indicate  a  degree  of  variation 
in  quarterly  earnings  greater  than  seems  rea- 
sonable, and  month-to-month  changes  in  income 
ally  tend  to  be  greater  between 
in  the  reference  period  re- 

d  within  each  interview  (David,  1983:11; 
and  Kasprzyk,  1984). 
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Feasibility  Tests 

In  addition  to  the  formal  experimental  com- 
parison of  self-respondent  versus  proxy  respon- 
dent rules,  two  other  more  specialized  respon- 
dent tests  were  carried  out.  One  examined  the 
accuracy  of  information  collected  for  students 
living  away  at  school  during  the  interview 
period  by  administering  the  fourth  wave  ques- 
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tionnaire  twice  for  absent  students  --  once  by 
proxy  at  the  parents'  address  and  a  second  time 
in  person  at  the  school  address.  The  basic 
objective  of  this  study  was  to  evaluate  both 
differences  in  reporting  and  the  additional 
burden  imposed  on  field  staff  when  students 
were  followed  to  their  temporary  addresses. 
With  regard  to  the  latter,  over  one-fourth  of 
the  students  identified  lived  at  a  school 
address  outside  the  sample  area,  and  of  those 
assigned  for  follow-up,  only  74  percent  were 
interviewed,  with  most  nonresponse  due  to 
inability  to  contact  respondents  at  their 
school  addresses.  Preliminary  analyses  of  data 
from  the  students  interviewed  indicate  that 
when  "amounts"  or  details  are  available  from 
both  the  self  and  proxy  interviews  the  quality 
of  proxy  responses  is  generally  quite  good,  but 
proxy  respondents  are  frequently  unable  to 
provide  a  "valued  response"  at  all  (cf.  Roman 
and  O'Brien,  1984).  In  general,  then,  proxy 
data  obtained  for  college  students  are  clearly 
somewhat  incomplete,  but  most  analyses  of  data 
from  the  1979  Research  Panel  should  not  be 
greatly  influenced  by  these  deficiencies,  with 
the  possible  exception  of  those  which  rely  on 
special  subsamples  containing  a  large  propor- 
tion of  college  students  and  focus  on  variables 
especially  prone  to  such  proxy  reporting  error. 

The  second  respondent  test  examined  the 
feasibility  of  using  off-line  mail-back  surveys 
for  obtaining  quarterly  estimates  of  nonfarm 
self-employment  income  from  respondents  owning 
a  business  or  professional  practice.  Because 
of  poor  response  rates,  this  particular  effort 
to  measure  subannual  self-employment  income  was 
abandoned  after  the  second  quarter.  Although 
some  substantive  analyses  have  been  conducted 
using  these  data  (e.g.,  Whiteman,  1983),  meth- 
odological analysis  took  the  form  of  additional 
experimentation  with  alternative  procedures  in 
an  effort  to  improve  this  performance,  none  of 
which  were  very  successful.  The  major  implica- 
tion of  this  feasibility  test  for  social  or 
policy  analysts  is  that  data  on  subannual 
self-employment  income  collected  in  the  1979 
Research  Panel  are  generally  regarded  as  defi- 
cient. 

The  staggered  interview  design  (mentioned 
earlier) ,  which  roughly  tripled  each  inter- 
viewer' s  experience  with  a  form,  was  itself  a 
feasibility  study.  In  addition  to  routine 
quality  control  interviews,  an  expanded  re- 
interview  program  was  initiated  to  determine 
whether  such  increased  interviewer  experience 
with  the  questionnaire  and  with  the  survey  in 
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conducted  to  date  provides  little  support  for 
the  proposition  that  monthly  interviewing 
resulted  in  substantially  improved  field  per- 
formance or  data  quality.   Should  such  differ- 
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conducted  the  first  week  of  the  calendar  qua; 
ter,  for  example. 

Two  other  feasibility  tests  incorporated  : 
the  1979  Research  Panel  were  designed  to  e: 
plore   issues   related   to   linkage   of   survi 


responses  with  data  in  administrative  records 
systems.  First,  since  the  Social  Security 
Number  (SSN)  is  the  identifier  in  most  general 
use,  a  project  to  determine  valid  SSN's  for 
sample  households  was  conducted  using  two 
rounds  of  validation,  both  including  a  computer 
match  and  manual  search  of  Social  Security 
Administration  (SSA)  administrative  records. 
Through  the  Use  of  those  procedures  and  ex- 
ploiting the  panel  design  to  obtain  corrected 
SSN's  from  the  field  in  later  interview  waves, 
valid  SSN's  were  determined  for  95.5  percent  of 
the  cases  included  in  the  project,  a  rate  that 
might  be  improved  with  minor  modifications  in 
the  future  (Kaspryzk,  1983b).  As  a  result, 
should  access  to  administrative  records  systems 
be  granted,  substantive  analysis  using  survey 
information  linked  to  records  data  -would  be 
possible  for  a  high  proportion  of  persons 
sampled  in  the  1979  Research  Panel. 

Second,  two  distinct  projects  were  undertaken 
to  examine  the  feasibility  of  linking  1979 
Research  Panel  data  to  benefit  records  of  the 
Supplemental  Security  Income  (SSI)  program. 
The  first  involved  a  match  of  survey  and  admin- 
istrative records  using  the  1979  Research  Panel 
SSI  subsample  and  SSI  administrative  tapes  in 
order  to  validate  information  common  to  both 
sources  and  enhance  the  survey  database. 
Overall,  3,950  sample  persons  in  the  1979 
Research  Panel  were  matched  with  the  SSI  data 
sets,  yielding  a  final  match  rate  of  99  per- 
cent. However,  analyses  of  data  quality  on 
this  survey-administrative  data  match  have  not 
yet  been  conducted,  and,  since  these  list  frame 
sample  cases  are  not  included  in  the  public  use 
microdata  files  (NTIS,  1983),  this  project  is 
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analysts.  Similarly,  the  second  linking  pro- 
ject, an  SSI  "domain  match",  was  designed  to 
determine  the  number  of  persons  included  in  the 
panel  through  the  area  and  Basic  Educational 
Opportunity  Grants  (BEOG)  subsamples  who  were 
also  in  the  frame  used  to  select  the  SSI  sub- 
sample.  Employing  a  match  indicator  code 
algorithm  using  validated  SSN's,  in  combination 
with  name,  race,  and  date  of  birth,  a  reason- 


able 


atch 


-ed,  albe: 


longer  time  period  than  would  be  required  to 
support  multiple  frame  estimation  (Kaspryzk, 
1983b).  Since  only  the  area  frame  cases  are 
included  in  the  public  use  files,  however, 
multiple  frame  estimation  is  not  required  for 
substantive  analyses  of  these  data. 

Finally,  in  an  effort  to  determine  the  incre- 
mental costs  of  following  movers  (an  integral 
feature  of  the  survey  design  for  the  1979 
Research  Panel  and  the  SIPP) ,  interviewers  were 
asked  to  keep  a  systematic  record  of  their 
mileage  and  time  spent  in  discovering,  locat- 
ing, and  following  up  persons  or  households 
that  moved.  A  detailed  analysis  of  this 
Mover's  Cost  Study  is  presented  by  White  and 
Huang  (1982),  who  (among  other  things)  reported 
a  mover  household  follow-up  rate  of  76  percent 
(with  an  eligible  person  interview  rate  of  92 
percent  in  interviewed  households)  and  a  cost 
increase  of  approximately  8  percent  attribut- 
able to  following  movers.  Of  particular  inter- 
est to  potential  policy  analysts  of  these  data 
is  that  nearly  78  percent  of  the  1979  Research 
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Panel  Wave  6  sample  households  had  never  moved. 
Relative  to  other  longitudinal  databases  where 
movers  are  not  followed,  then,  sample  attrition 
due  to  this  factor  is  clearly  lower  in  the  1979 
Research  Panel,  and  estimates  involving  vari- 
ables related  to  residential  mobility  less 
subject  to  such  nonresponse  bias. 
CONCLUSION 


In 


clusi< 


should  be  clear  that,  aside  from  their  great 
analytic  potential,  some  of  these  tests  and 
experiments  may  also  have  a  deletorious  or 
confounding  impact  on  certain  substantive 
analyses  that  might  be  conducted  using  data 
from  the  1979  Research  Panel.  And,  in  a  few 
cases,  these  methodological  have  some  positive 
implications  for  such  analyses  as  well.  Such 
implications  range  from  some  obvious  defici- 
encies in  some  of  these  data  highlighted  by 
these  field  tests  to  more  subtle  influences  on 
data  quality  and  variances  due  to  the  experi- 
mental treatments  imposed  on  the  survey  design. 
Especially  with  regard  to  the  latter,  the 
positive  benefit  of  including  such  methodologi- 
cal tests  in  the  survey  design  is  that  the 
potential  influence  of  such  factors  on  substan- 
tive results  from  this  survey  may  be  directly 
assessed  in  data  analyses.  It  is  important  to 
note,  however,  that  if  these  factors  are  not  so 
examined  their  influence  may  lead  to  distorted 
or  spurious  conclusions.  By  describing  some 
possible  road  blocks  that  these  tests  and 
experiments  may  throw  in  the  way  for  substan- 
tive analysis  and  interpretation,  this  paper 
has  sought  to  illustrate  the  need  for  both 
consumers  and  analysts  of  these  data  to  keep 
their  methodological  nature  clearly  in  mind 
and,  where  possible,  to  assess  directly  the 
potential  influence  of  these  factors  on  re- 
search results. 

REFERENCES 
Czajka,  John  L. 

1982  "How  complete  was  food  stamp  reporting 
in  Wave  II  of  the  ISDP  Panel."  Manu- 
script. Washington,  D.C.:  Mathematica 
Policy  Research,  Inc. 

Czajka,  John  L. 

1983  "Subannual  income  estimation."  Pp. 
87-97  in  Martin  H.  David  (Ed.),  Technical, 
Conceptul,  and  Administrative  Lessons  of 
the  Income  Survey  Development  Program 
(ISDP).  New  York:  Social  Science  Re- 
search Council. 

Coder,  John  F. 

1980  "Some  results  from  the  1979  Income 
Survey  Development  Program  Research 
Panel."  Pp.  540-545  in  Proceedings  of  the 
Section  on  Survey  Research  Methods, 
American  Statistical  Association. 

David,  Martin  H. 

1983  "Measuring  income  and  program  parti- 
cipation." Pp.  3-20  in  Martin  H.  David 
(Ed.),  Technical,  Conceptual,  and  Adminis- 
trative Lessons  of  the  Income  Survey 
Development  Program  (ISDP).  New  York: 
Social  Science  Research  Council. 

Kaluzny,  Richard 

1982  Evaluation  of  Experimental  Effects, 
1979  Research  Panel—Wave  1.  Final  Re- 
port. Princeton,  NJ.:  Mathematica  Policy 
Research,  Inc. 


Kaluzny,  Richard 

1982  Evaluation  of  Experimental  Effects, 
1979  Research  Panel—Wave  2.  Draft  Final 
Report.  Princeton,  NJ.:  Mathematica 
Policy  Research,  Inc. 

Kasprzyk,  Daniel 

1983a  "Some  research  issues  for  the  Survey  of 
Income  and  Program  Participation."  Pp. 
686-691  in  Proceedings  of  the  Section  on 
Survey  Research  Methods,  American  Statis- 
tical Association. 

Kaspryzk,  Daniel 

1983b  "Social  security  number  reporting,  the 
use  of  administrative  records,  and  the 
multiple  frame  design  in  the  Income  Survey 
Development  Program."  Pp.  123-141  in 
Martin  H.  David  (Ed.),  Technical,  Concep- 
tual, and  Administrative  Lessons  of  the 
Income  Survey  Development  Program  (ISDP). 
New  York:  Social  Science  Research 
Council. 

Kulka,  Richard  A. 

1983  "Tests  and  Experiments."  Pp.  4-1  to 
4-39  in  P.  Nileen  Hunt,  et  al.,  ISDP  1979 
Research  Panel  Documentation.  Spring- 
field, VA:  National  Technical  Information 
Services. 

Moore,  Jeffrey  C,  and  Daniel  Kasprzyk 

1984  "Month-to-month  income  recipiency 
changes  in  the  ISDP."  Paper  presented  at 
the  annual  meetings  of  the  American  Sta- 
tistical Association,  Section  on  Survey 
Research  Methods,  Philadelphia,  August 
13-16. 

Roman,  Anthony  M. ,  and  Diane  V.  O'Brien 

1984  "Findings  from  the  student  follow-up 
investigation  of  the  1979  Income  Survey 
Development  Program."  Paper  presented  at 
the  annual  meetings  of  the  American  Sta- 
tistical Association,  Section  on  Survey 
Research  Methods,  Philadelphia,  August 
13-16. 

Vaughan,  D.R. ,  and  C.G.  Lancaster 

1980  "Applying  a  cardinal  measurment  model 

Synopsis   of  a  preliminary   look."   Pp. 
546-551  in  Proceedings  of  the  Section  on 
Survey  Research  Methods,   American  Statis- 
tical Association. 
White,  Glenn  D. ,  Jr.,  and  Hertz  Huang 

1982  "Mover  follow-up  costs  for  the  Income 
Survey  Development  Program."  Pp.  376-381 
in  Proceedings  of  the  Section  on  Survey 
Research  Methods,  American  Statistical 
Association. 

Whiteman,  T.  Cameron 

1983  "Lessons  to  be  learned  from  ISDP:  The 
measurement  of  nonfarm  self-employment 
income."  Pp.  73-82  in  Martin  H.  David 
(Ed.),  Technical,  Conceptual,  and  Adminis- 
trative Lessons  of  the  Income  Survey 
Development  Program  (ISDP).  New  York: 
Social  Science  Research  Council. 

Yeas,  Martynas,  and  Charles  A.  Lininger 

1981  "The  Income  Survey  Development  Program: 
Design  features  and  initial  findings." 
Social  Security  Bulletin  44  (November): 
13-19. 


148 


SOME  DATA  COLLECTION  ISSUES  FOR  PANEL  SURVEYS  WITH  APPLICATION 
TO  THE  SURVEY  OF  INCOME  AND  PROGRAM  PARTICIPATION 

Anne  C.  Jean  and  Edith  K.  McArthur,  Bureau  of  the  Census 


Introduction:  Need  for  a  Longitudinal  Survey. 

The  Survey  of  Income  and  Program  Participation 
1s  designed  to  collect  data  which  will  Improve 
our  understanding  of  the  income  distribution, 
wealth,  and  poverty  1n  this  country.  Information 
collected  1n  the  survey  win  be  useful  for 
planners  and  program  administrators  1n  areas 
such  as  income  support  programs  and  health  care. 
The  survey  is  longitudinal  in  the  sense  that  the 
same  persons  are  Interviewed  periodically  over 
an  approximately  2  1/2  year  period.  This  implies 
following  persons  and  updating  Information  that 
reflects  changes  in  their  lives  and  1n  the  com- 
position of  the  households  of  which  they  are 
members — before,  during,  and  after  these  changes 
occur.  Persons  in  SIPP  are  interviewed  every 
four  months.  At  each  interview,  household 
members  15  years  old  or  over  are  asked  to  report 
on  income  sources,  amounts  and  employment  for 
each  of  the  previous  four  months. 

With  SIPP  data  we  will  be  able  to  observe  the 
effects  over  time  of  chanaes  in  receipt  of  dif- 
ferent types  of  income  upon  the  total  income  of 
a  household;  we  will  also  see  the  effects  of 
household  composition  change,  such  as  the  birth 
of  a  child  or  a  marital  separation,  on  partici- 
pation in  Federal  transfer  programs.  In  the 
past,  analysts  have  often  relied  upon  the  income 
data  collected  1n  cross-sectional  surveys,  such 
as  the  March  supplement  of  the  Current  Population 
Survey  (CPS).  The  CPS  describes  household  member- 
ship at  a  point  in  time,  while  obtaining  income 
data  for  the  entire  previous  calendar  year. 
These  data  are   conseauently  dependent  on  the 
household  respondent's  recall  of  events  over  the 
whole  previous  year.  Thus  many  assumptions  are 
made  and  monthly  data  cannot  be  collected 
accurately. 

Implementation  of  a  Longitudinal  Survey. 

The  1984  panel  is  the  first  panel  of  the  SIPP. 
During  the  four  months  constituting  Wave  1, 
that  is  October  1983  through  January  1984,  Census 
interviewers  visited  approximately  26,000 
addresses  located  in  174  primary  samplino  units 
(PSUs)  nationwide.  The  addresses  were  evenly 
distributed  among  four  rotation  groups,  and  each 
month  one  rotation  group  1s  assigned  for  inter- 
view. Nine  interviews  at  four  month  intervals 
were  scheduled  for  three  rotations;  the  fourth 
rotation  was  scheduled  for  eight  Interviews. 

The  shift  from  an  address  sample  for  the  first 
visit  to  a  person  sample  in  subsequent  visits 
presented  unique  challenges  to  the  planning 
staff,  regional  office  staffs,  and  interviewers. 
Updating  procedures  for  the  address  listings, 
nonlntervlew  classifications,  Interviewing  pro- 
cedures, and  many  other  activities  required  for 
surveys  maintaining  an  address  sample  were  not 
appropriate.  New  controls  and  follow-up  pro- 
cedures, some  requiring  Interregional  office 
cooperation,  were  Implemented.  Interviewers 
received  extensive  training  on  new  nonlntervlew 


classifications  and  movers'  procedures.  Office 
staff  maintained  extensive  clerical  controls  to 
guarantee  the  receipt  of  control  cards  and 
questionnaires  from  Interviewers  and  to  monitor 
the  processing  of  over  40,000  person  records  that 
were  uniquely  identified. 

The  remainder  of  this  paper  describes  the  Wave 
1  field  procedures  associated  with  the  Implement- 
ation of  the  address  sample  and  the  follow-up 
procedures  for  subsequent  waves.  Included  is  an 
explanation  of  the  SIPP  Identification  system 
and  those  field  operations  desiqned  specifically 
for  sample  maintenance  and  control.  Some  prel- 
iminary results  of  the  1984  panel  follow-up  are 
given  and  finally  proposals  for  improving  the 
follow-up  system  in  future  panels  are  discussed. 

Wave  1  Address  Sample  Procedures . 

Field  activities  for  the  first  SIPP  interview 
were  similar  to  operations,  undertaken  for  other 
major  surveys  that  are  basically  cross-sectional, 
such  as  CPS  and  the  National  Crime  Survey  (NCS). 
Interviewers  listed  specific  addresses  of  living 
quarters  either  prior  to  or  at  the  time  of  the 
interview  visit.  Reasons  for  differences  between 
the  number  of  expected  units  based  on  census 
address  lists  and  the  number  of  units  listed  by 
the  interviewer  were  researched  by  the  office 
staffs.  During  the  first  interview,  the  address 
was  verified,  the  unit  was  classified  as  a  hous- 
ing unit  or  OTHER  unit  according  to  census  defin- 
itions. Coverage  questions  were  asked  to  deter- 
mine if  EXTRA  or  additional  units  were  located 
at  the  address,  and  the  interview  status  of  the 
address  was  recorded. 

The  interview  status  distinguished  interviewed 
households  from  noninterviews.  Noninterviews 
were  further  classified  by  type.  For  example, 
Type  A  noninterviews  include  all  eligible  house- 
holds for  which  interviews  were  not  obtained, 
such  as  refusals  or  cases  where  no  one  was  home 
each  time  the  interviewer  visited.  Types  R  and  C 
noninterviews  were  recorded  for  addresses  con- 
taining no  eligible  household  such  as  vacant 
addresses,  or  units  under  construction  or  being 
demolished. 

In  an  interviewed  household  the  interviewer 
listed  all  persons  currently  living  or  stayinq  at 
the  address,  and  applied  a  set  of  household 
membership  rules  to  classify  each  person.  Listed 
persons  were  classified  as  household  members  if 
the  sample  address  was  their  usual  place  of 
residence  as  of  the  date  of  interview.  The 
specific  rules  for  household  membership  in  SIPP 
are  identical  to  those  used  in  CPS.  All  house- 
hold members  listed  in  Wave  1  were  desianated  as 
sample  persons.  After  listing  all  household 
members,  demographic  Information,  such  as  aoe, 
sex,  and  relationship,  was  obtained  for  each 
household  member  and  a  SIPP  questionnaire  was 
completed  for  each  household  member  who  was  15 
years  old  or  over. 
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Development  of  Special  Procedures  for  the 
Longitudinal  Survey. 

The  procedural  differences  between  SIPP  and 
most  other  major  household  surveys  conducted  by 
the  Census  Bureau  beqin  with  the  second  Inter- 
view. While  other  major  surveys  such  as  the  CPS 
and  NCS  return  to  the  same  address  for  each 
subsequent  visit  regardless  of  whether  the 
occupants  of  that  address  change,  the  SIPP  Inter- 
viewer returns  to  interview  the  same  sampl e 
persons—that  is,  persons  listed  during  the 
first  Interview.  If  persons  move  to  a  new 
address,  they  are  followed  and  interviews  are 
obtained  at  the  new  address.  Between  March  1981 
and  March  1982  almost  17  percent  of  the  pop- 
ulation of  the  United  States  moved.1/  If  SIPP 
did  not  follow  movers  from  the  original  sample 
households,  we  would  lose  the  capability  of 
observinq  the  effects  of  many  major  changes  in 
the  original  sample  households,  and  the  person 
sample  would  be  biased,  since  it  would  not  in- 
clude movers. 

Interviewers  who  discover  that  a  Wave  1  sample 
person  has  moved  (usually  while  updating  the 
household  roster)  are  instructed  to  inquire  for 
new  addresses  at  the  original  address  and  if 
further  inquiry  is  necessary  they  are  to  contact 
mail  carriers,  rental  agents,  real  estate  com- 
panies and  postal  supervisors.  Other  sources 
may  be  used,  such  as  an  employer  or  a  contact 
person  (this  is  a  person  identified  by  the  res- 
pondent during  the  initial  interview  as  one  who 
would  usually  know  where  the  respondent  was,  such 
as  a  relative  or  a  close  personal  friend). 
Occasionally,  interviewers  contact  other  persons 
with  the  same  last  name  listed  in  local  telephone 
books,  although  this  procedure  is  not  specified 
in  their  follow-up  instructions.  Beginning  with 
the  third  interview  visit,  change  of  address 
notification  forms  are  left  with  respondents 
and  respondents  are  encouraged  to  mail  these  to 
the  census  regional  offices  (the  address  of  the 
appropriate  census  regional  office  is  preprinted 
on  the  form).   In  addition,  advance  letters  are 
mailed  to  respondents  before  each  interview; 
if  the  respondent  no  longer  lives  at  that  address 
the  post  office  is  requested  to  provide  a  for- 
warding address. 

The  regional  office  staff  determines  whether  a 
new  address  will  be  assigned  for  a  personal  visit. 
Personal  visits  are  required  for  all  new 
addresses  located  in  SIPP  PSUs  or  within  100 
miles  of  a  SIPP  PSU.  Telephone  interviews  are 
encouraged  for  all  sample  persons  who  have  moved 
to  an  address  located  more  than  100  miles  of  a 
SIPP  PSU  within  the  United  States.  The  following 
persons  are  excluded  from  mover  follow-up: 

(1)  Persons  who  join  Wave  1  sample  persons  in 
later  waves  are  not  followed  to  new 
addresses  unless  these  additional  persons 
remain  with  Wave  1  sample  persons  who  are 
15  years  old  or  over; 

(2)  Persons  who  move  out.  of  the  sample  uni- 
verse are  not  followed.  These  are  per- 
sons who  become  Institutionalized,  move 
outside  of  the  United  States  or  live  1n 
an  Armed  Forces  barracks; 

(3)  Children  under  15  who  move  and  are  not 
accompanied  by  a  sample  person  who  1s  15 


years  old  or  over  are  not  followed. 
The  geographic  area  covered  by  personal  visit 
follow-up  is  extensive.  Based  upon  the  1980 
census  population  distribution,  about  130  million 
persons  live  1n  areas  within  SIPP  PSUs;  another 
87  million  persons  live  within  100  miles  of  the 
outer  boundary  of  a  SIPP  PSU.  We  counted  226 
million  persons  1n  1980;  217  million  are  within 
our  currently  covered  areas— 96  percent  of  the 
population. 

Of  the  17  percent  of  the  population  that  moved 
between  March  1981  and  March  1982,  the  great 
majority  moved  only  a  short  distance— about  two- 
thirds  of  the  movers  stayed  1n  the  same  county 
(10  percent  of  the  total  population) .£'  If 
persons  or  households  move  within  the  same 
county  (or  to  a  nearby  county),  the  new  address 
1s  usually  assigned  to  the  same  Interviewer  for 
follow-up.  The  remaining  third  who  move  outside 
of  their  original  county  are  usually  transferred 
to  another  Interviewer.  This  occasionally 
involves  a  transfer  between  two  census  regional 
offices.3/ 

Of  the  26,024  addresses  included  in  the 
original  SIPP  address  sample,  19,878  addresses 
were  interviewed  households  in  Wave  1  and  were 
reassigned  for  a  second  visit.  The  6,146 
addresses  reported  as  noninterview  at  the  time 
of  the  first  visit  were  not  reassigned.  Of 
these  noninterviews,  1,019  were  eligible  house- 
holds whose  members  refused  to  participate  in  the 
survey,  or  were  temporarily  absent,  unable  to  be 
located  or  not  interviewed  for  other  reasons. 
Survey  planners  were  reluctant  to  reassign  in 
Wave  2  those  Wave  1  eligible  noninterviews  be- 
cause of  the  added  complexity  for  both  the  inter- 
viewers and  the  processing  system.4/ 

Interviewers  visited  sample  addresses  for  the 
second  interview  during  February  throuqh  May 
1984  and  attempted  to  locate  and  Interview  the 
approximately  40,000  sample  persons  interviewed 
during  the  first  visit.  New  persons  not  present 
initially  were  added  to  the  household  rosters, 
provided  original  sample  persons  were  still 
Included  on  the  roster.  Any  new  persons  who  were 
household  members  15  years  old  or  older  were 
also  eligible  for  interview.  If  no  sample  person 
remained  at  an  address,  no  interviews  were  con- 
ducted at  that  address,  but  interviewers  were 
required  to  follow  the  sample  persons  to  their 
new  addresses. 

The  SIPP  Identification  System. 

The  SIPP  Identification  System  is  a  numbering 
system  designed  to  provide  a  unique  unchanging 
Identifier  for  each  person  in  an  interviewed 
household.  The  person  identifier  is  used  to 
link  data  from  more  than  one  interview  for  the 
same  individual  regardless  of  what  moves  have 
taken  place  or  what  changes  1n  household  member- 
ship have  occurred  since  Wave  1.   In  addition, 
the  ID  system  provides  the  means  for  grouping 
Individuals  into  unique  households  in  each  wave. 
This  is  an  important  attribute,  which  allows  for 
the  tracking  and  identification  of  changing 
household  membership— persons  moving  away  can  be 
linked  to  each  household  of  which  they  have  been 
a  member  since  their  first  Interview.  However, 
no  attempt  is  made  during  the  field  operations 


150 


to  define  or  number  each  "different"  household 
for  longitudinal  analysis. 

The  components  of  the  operational  SIPP  identi- 
fication system  are: 

PSU  number  -  3  digits 
Segment  number  -  4  dialts 
Serial  number  -  2  digits 
Address  I.D.  -  2  digits 
Entry  address  I.D.  -  2  digits 
Person  number  -  3  digits 

The  PSU  and  segment  numbers  are  assigned  by 
Washinaton  staff  during  sample  selection.  The 
3-digit  PSU  number  identifies  a  county  or  group 
of  counties  and  is  the  same  number  used  by  other 
census  surveys,  such  as  the  CPS  and  the  NCS.  As 
a  sample  of  seqments,  that  is,  clusters  of  hous- 
ing units,  is  drawn  from  a  PSU,  the  segments  are 
uniquely  numbered  within  each  PSU,  usina  a  4- 
digit  number.  The  clusters  generally  range  in 
size  from  two  to  four  housing  units.  Office 
staff  in  the  12  regional  offices  are  responsible 
for  assigning  the  2-digit  serial  number.  The 
2-digit  serial  number  is  assigned  sequentially  in 
Wave  1  to  each  SIPP  livinq  quarters  within  a 
segment. 

The  9-digit  combination  PSU,  segment,  and 
serial  number  uniquely  identifies  each  sample 
address  for  the  first  interview.  As  a  result, 
SIPP  households  interviewed  during  Wave  1, 
(October  1983-January  1984)  can  be  uniquely 
identified  with  these  three  components:  PSU, 
segment,  and  serial  number.  The  PSU,  segment, 
and  serial  numbers  never  change,  regardless  of 
movers  and  new  household  formations. 

For  SIPP,  a  2-digit  address  ID  code  is  added 
by  office  staff  to  provide  a  means  for  identify- 
ing more  than  one  unique  household  associated 
with  the  same  PSU,  segment,  and  serial  number. 
This  situation  occurs  after  Wave  1,  when  an 
original  Wave  1  household  splits  up  to  form  more 
than  one  household.  The  first  digit  of  the 
address  I.D.  code  indicates  the  wave  a  new 
address  is  first  assigned  for  interview.  The 
second  diqit  sequentially  numbers  households 
originatinq  from  the  same  PSU,  segment,  and 
serial  number.  While  not  essential  for  Wave  1, 
an  address  ID  code  of  11  was  assigned  to  all 
Wave  1  sample  addresses.  In  later  Waves,  as  SIPP 
sample  persons  move  to  new  addresses,  the  office 
staff  assigns  new  address  ID  codes  to  each  new 
address  brought  into  the  survey  by  movers. 
Address  ID  codes  assigned  during  a  previous  wave 
are  deleted  from  the  processing  system  for  the 
current  and  successive  waves  1f  no  SIPP  sample 
persons  remain  at  the  address.  Thus,  the  com- 
bination of  PSU,  segment,  serial  number,  and 
address  ID  code  uniquely  identifies  each  sample 
address  at  each  qlven  Wave.  As  only  one  sample 
household  1s  associated  with  a  sample  address, 
this  combination  provides  unique  household 
identifiers  for  a  given  Wave. 

The  person  identification  number  1s  a  5-d1g1t 
number  consisting  of  an  entry  address  ID  code 
and  a  person  number.  It  1s  assigned  by  the  Inter- 
viewer as  each  person  1s  Initially  listed  on  the 
household  roster.  As  the  interviewer  lists  the 
name  of  each  person  in  the  household,  he/she 


transcribes  the  current  2-digit  address  ID  code 
to  each  person's  record.  The  2-digit  number  is 
the  entry  address  ID.  Next,  the  Interviewer 
assigns  a  3-digit  person  number  to  each  person. 
Numbers  101,  102,  and  so  on,  are  assigned  to 
persons  at  the  sample  address  1n  Wave  1;  the 
numbers  201,  202,  and  so  on,  are  assigned  to 
persons  added  to  the  roster  1n  Wave  2;  and  so 
forth.  The  first  digit  Indicates  the  wave  the 
person  enters  the  survey.  This  5-digit  number 
consisting  of  entry  address  ID  and  person 
number  is  not  changed  or  updated,  except  Tn  rare 
Instances  of  merged  households  which  are  des- 
cribed  later. 

Thus,  the  14-digit  combination  of  PSU,  seg- 
ment, serial,  entry  address  ID,  and  person  number 
uniquely  identifies  each  person  1n  the  SIPP  sur- 
vey and  can  be  used  to  link  data  for  persons 
across  waves.  The  PSU,  segment,  serial,  and 
address  ID  code  uniquely  Identifies  each  house- 
hold in  a  given  wave;  and  the  PSU,  -segment,  and 
serial  number  can  link  all  households  1n  sub- 
sequent waves  back  to  the  original  Wave  1  house- 
hold. 

An  example  of  the  numbering  scheme  may  help  to 
clarify  it  further.  Consider  a  Wave  1  household. 
There  is  a  basic  control  number  consisting  of  PSU, 
segment  and  serial  number,  along  with  the  address 
ID  code.  At  the  time  of  the  first  visit,  four 
persons  are  listed--a  father,  mother,  son  and 
daughter.  Each  is  assigned  the  current  address 
ID  code  11,  alonq  with  a  three  digit  person 
number— 101,  102,  103,  and  104. 

The  interviewer  returns  four  months  later  and 
finds  that  the  mother  and  father  remain  at  the 
original  address.  The  two  children  have  moved 
to  separate  new  addresses  and  both  have  married. 
The  separate  new  addresses  retain  the  basic 
control  number  (PSU,  segment  and  serial  number). 
One  new  address  receives  address  ID  code  21,  the 
other  receives  22.  A  new  person,  the  son's  wife 
is  added.  She  is  added  at  an  address  coded  21 
in  Wave  2,  so  she  receives  an  entry  address  ID  of 
of  21  and  person  number  201.  The  daughter's 
husband  1s  added  at  an  address  coded  22,  so  his 
person  ID  is  22-201.  The  original  persons,  the 
son  and  daughter,  do  not  change  their  person 
ID's. 

In  Wave  3,  the  mother  and  father  retire  and 
move  to  Florida.  No  one  lives  at  the  original 
address.  The  mother  and  father  moved  in  Wave  3, 
so  their  new  address  ID  code  is  31.  Their  person 
ID's  remain  the  same.  The  son  and  his  wife 
haven't  moved  in  Wave  3.  Their  address  ID's  do 
not  change.  The  daughter  1s  still  at  the  same 
address,  so  her  address  ID  doesn't  change.  How- 
ever, she  has  split-up  with  her  husband  and  he 
has  moved  out.  Since  her  husband  is  not  an 
original  Wave  1  sample  person,  he  is  not  followed 
to  his  new  address. 

As  mentioned  previously,  the  operational  phase 
makes  no  attempt  to  apply  longitudinal  household 
definitions  to  the  changing  relationships,  nor 
to  number  households  longitudinally.  However, 
as  analysts  develop  longitudinal  definitions,  the 
current  data  base  must  be  able  to  provide  the 
Information  required  to  support  these  definitions. 
Further  refinements  1n  the  questions  asked  at 
each  Interview  may  be  implemented  as  the  needs  of 
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a  longitudinal  household  definition  become  more 
precisely  specified. 

The  SIPP  numbering  system  has  several  advan- 
tages over  alternative  schemes  that  have  been 
considered: 

(1)  The  portion  of  the  control  number  consist- 
ing of  PSU,  segment,  and  serial  number  1s 
similar  to  the  numbering  system  used  by 
other  major  surveys  conducted  by  the 
Bureau. 

(2)  Interviewers  are  able  to  assign  person 
numbers  during  the  course  of  the  Inter- 
view. The  person  number  1s  used  1n 
various  parts  of  the  questionnaire  during 
the  interview.  This  number  1s  also  tran- 
scribed to  several  other  survey  documents 
during  the  Interview  and  immediately  after- 
ward during  clerical  coding  operations.  A 
person  number  assigned  after  the  time  of 
Interview  does  not  provide  this  Immediate 
linkage. 

(3)  The  person  number  itself  has  relatively 
few  digits,  reducing  the  possibility  of 
transcription  errors. 

Several  disadvantages  have  been  noted: 

(1)  Duplications  of  person  numbers  for  add- 
itional persons  (persons  added  after 
Wave  1)  can  conceivably  occur  1n  situ- 
ations where  households  have  split  and 
are  in  different  regional  office  juris- 
dictions. The  computer  processing  system 
identifies  these  duplicates  and  the 
regional  office  staff  corrects  them 
durinq  processing. 

(2)  Mergers  between  two  separate  sample  house- 
holds require  special  procedures.  If  this 
situation  occurs,  one  set  of  controls  is 
retained  for  the  merged  household.  New 
person  numbers  are  assigned  to  those 
persons  who  lose  their  original  Identi- 
fiers. Interviewers  record  both  the  old 
and  new  ID  numbers  on  the  control  card 

to  provide  a  means  for  linking  the  two 
ID's.  By  the  end  of  the  second  Wave, 
this  had  occurred  once. 

Monthly  Cross-Sectional  Households. 

While  the  ID  system  provides  identifiers  for 
each  household  in  a  given  wave,  1t  does  not 
identify  households  for  a  qiven  month.  Monthly 
cross-sectional  households  are  not  constructed 
in  the  field;  rather  they  are  constructed  during 
processing  using  information  obtained  during  each 
wave.  During  each  visit,  demographic  character- 
istics such  as  changes  1n  marital  status,  changes 
in  reference  person  (householder)  status,  and 
changes  in  household  relationships  are  recorded 
on  a  control  card.  The  same  control  card  is 
used  for  each  visit  to  the  same  address.  If  a 
sample  person  moves  to  a  new  address,  the  Inter- 
viewer prepares  a  new  control  card  for  the  new 
address  and  transcribes  any  Information  that  is 
not  expected  to  change.  Date  entered  (month  and 
day)  and  date  left  (month  and  day)  are  recorded 
on  the  control  card  for  every  entry  and  exit  from 
an  address.  Reasons  for  entries  and  exits  are 
coded: 

Entry     1  -  birth         ) 
2  -  marriage      ) 


3  -  other        ) 
4-5/  ) 

Exit     5  -  deceased 

6  -  Institutionalized 

7  -  living  1n  Armed  Forces 

barracks 

8  -  moved  outside  of  country 

9  -  separation  or  divorce 

10  -  person  who  joined  a  household 

durinq  Wave  2  or  later  and  who 
1s  no  longer  living  with  any 
sample  person 

11  -  other 

99  -  listed  in  error 
Date  entered  and  left  1s  used  during  process- 
ing to  group  persons  Into  households  for  a  given 
month.  A  person  entering  a  household  before  mid- 
month  is  considered  to  be  a  member  for  the  entire 
month;  a  person  entering  after  mid-month  is  con- 
sidered not  to  be  a  household  member  for  that 
month.  A  similar  mid-month  cutoff  date  is  used 
for  persons  leaving  households.  As  this  monthly 
household  determination  is  done  during  process- 
ing, 1t  does  not  affect  field  operations,  short 
of  obtaining  month  and  day  of  entries  and  exits. 

Clerical  Field  Controls. 

The  SIPP  movers'  procedures  have  long  been 
recognized  as  ambitious,  requiring  a  system  of 
field  controls  that  are  more  extensive  than  those 
1n  effect  for  other  major  surveys  conducted  by 
the  Bureau.  Two  standard  forms  are  used  for 
controlling  Interviewer  assignments,  and  a 
third  control  was  developed  specifically  for 
SIPP.  All  three  forms  are  used  during  a  clerical 
check-in  at  the  regional  offices. 

An  interviewer's  Assignment  and  Control  form 
is  completed  for  each  interviewer,  listing  every 
case  in  a  given  interviewer's  assignment.  A  copy 
1s  sent  to  the  interviewer  and  a  control  copy  is 
kept  in  the  office.  As  completed  questionnaires 
are  returned  to  the  office,  they  are  checked  in 
against  this  form.  A  second  control  form  lists 
all  interviewers  and  the  number  of  assigned  cases 
for  each  interviewer.  Tallies  are  kept  as 
material  1s  returned.  This  form  gives  super- 
visors a  summary  of  the  number  of  outstanding 
cases  for  a  given  month.  The  third  control 
developed  specifically  for  SIPP  is  a  computer- 
generated  listing  of  all  persons  listed  as  house- 
hold members  in  Wave  1.  It  includes  names, 
person  numbers,  interviewer  codes,  and  interview 
status.  The  regional  offices  update  the  listing 
during  each  wave  and  account  for  every  inter- 
viewed person  as  documents  are  received  from 
Interviewers.  These  three  forms  provide  the 
basis  of  the  clerical  check-In  and  control. 
They  must  be  updated  to  account  for  assignments 
that  are  transferred  between  interviewers  and 
between  regional  offices,  and  they  must  be  up- 
dated to  Include  new  persons  entering  SIPP  after 
Wave  1.  Two  other  control  forms  are  used  by  the 
offices  to  facilitate  the  movers'  operation. 
One  form  is  used  to  11st  the  original  address 
of  a  sample  household  along  with  all  subsequent 
addresses.  It  1s  used  primarily  to  control  the 
assignment  of  address  ID  codes.  A  second  form, 
a  worksheet,  1s  used  for  transferring  cases  from 
one  Interviewer  to  another  by  telephone.  Because 
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of  time  constraints,  transfers  are  done  by  tele- 
phone; and  required  control  card  Information — 
such  as  new  address,  names  of  persons,  demo- 
graphic Information  for  the  movers,  etc. — must 
be  obtained  from  the  original  interviewer  and 
passed  on  to  the  new  interviewer. 

While  the  scope  of  this  paper  concerns  field 
operations,  some  mention  must  be  made  of  two 
major  features  in  the  computerized  processing 
system  designed  for  check-In  and  control. 

(1)  During  the  keying  operation  all  persons 
listed  on  the  control  card  who  are  15  or 
over  and  are  current  household  members 
must  have  an  accompanying  questionnaire. 
This  check  is  done  automatically  at  each 
keying  station.  Keying  1s  done  in  the 
regional  offices  and  immediate  resolution 
of  missing  questionnaires  is  required. 

(2)  At  the  end  of  each  of  the  four  months  of 
each  wave,  a  centralized  check-in  is  com- 
pleted in  Washington.  A  control  card 
record  must  be  transmitted  for  every 
person  showing  an  active  status  on  a 
master  file  maintained  in  Washington  of 
all  active  records.  Offices  cannot 
close  out  an  interview  month  until  every 
active  status  person  is  accounted  for 
and  some  demoqranhic  data--aqe,  race  and 
sex — is  verified  to  make  sure  that  we 
are  not  checkinq  in  the  wrong  person. 
Each  missing  case  is  referred  to  the 
appropriate  regional  office  for  reso- 
lution. 

Experience  with  Following  Movers. 

Available  data  for  follow-up  interviews  con- 
ducted during  February -May  1984  gives  an  initial 
indication  of  the  success  rates  for  the  SIPP 
follow-up. 

(1)  Percentaqe  of  movers  found:  about  B0%. 

(2)  Percentage  of  movers  lost:  about  20*-- 
represents  0.9  percent  of  all  eligible 
SIPP  households. 

When  sample  persons  move  from  the  address  at 
which  they  were  contacted  in  the  previous  Wave 
(four  months  before),  interviewers  are  instructed 
to  go  through  a  series  of  steps  to  locate  the 
new  address.  If  all  the  steps  are  "Dead  ends" 
they  fill  in  a  form  which  describes  what  they 
know  about  the  mover  situation  for  those  sample 
persons.  A  review  of  the  forms  for  Wave  2  avail- 
able at  the  time  this  paper  was  written  (they 
are  submitted  on  a  flow  basis  and  a  form 
was  not  submitted  for  some  of  the  cases)  illust- 
rated the  kinds  of  events  that  took  place  leading 
to  the  sample  person's  moving  without  leavina  a 
trace.  In  about  half  of  the  cases  all  household 
members  moved  leavinq  no  forwarding  address. 
For  another  quarter  of  the  cases  one  or  more 
persons  had  left  the  household  leaving  other 
members  behind  but  those  other  persons  had  no 
Information  about  the  departee's  whereabouts. 
In  an  additional  15  percent  of  the  cases,  the 
spouse  (usually  the  husband)  left  the  rest  of 
the  family  and  the  remaining  spouse  could  not 
or  would  not  give  a  forwarding  address.  The 
remaining  cases  showed  a  variety  of  events; 
for  example  the  person  had  moved  and  had  no 
permanent  new  address,  rather  he  was  just 


staying  with  various  friends  but  the  Interviewer 
had  no  success  1n  contacting  him.  The  inter- 
viewers' comments  showed  considerable  efforts  in 
attempting  to  track  these  movers. 

Recommendations  for  future  SIPP  Panels. 

Improvements  1n  the  processing  system  and  the 
expansion  of  follow-up  procedures  are  envisioned 
for  future  panels.  These  recommended  changes  are 
Intended  to  Improve  sample  coverage  in  a  number 
of  areas. 

In  the  1984  panel,  persons  who  leave  the 
sample  universe—become  institutionalized, 
leave  the  country,  or  live  1n  an  Armed  Forces 
barracks — are  dropped  from  the  sample. 6/  As 
of  the  1980  census,  about  2.5  million  persons 
were  currently  Inmates  of  institutions  such  as 
mental  hospitals,  homes  for  the  aged  and 
correctional  institutions.  Another  613,000 
persons  were  living  in  military  barracks.  Demo- 
graphers estimate  that  about  160,000  persons 
emigrate  from  the  United  States  each  year.?/ 
As  average  stays  in  nursing  homes  are  less  than 
60  days  and  live  discharges  account  for  about  75 
percent  of  the  discharges,  a  sample  person  who 
goes  into  a  nursing  home  is  likely  to  come  out 
before  the  end  of  the  SIPP  panel.  According  to 
current  procedures,  members  of  each  of  these 
groups  are  reinstated  only  if  they  rejoin  a  SIPP 
household. 

For  the  SIPP  panel  beginning  in  January  1985, 
planning  is  underway  to  track  sample  persons  who 
become  institutionalized.  Interviewers  will 
obtain  the  name  of  the  institution  in  which  the 
person  is  residing.  At  each  subsequent  inter- 
view they  will  determine  whether  the  person  is 
still  there  and  if  the  person  has  been  discharged 
they  will  obtain  a  new  address.  It  will  then  be 
possible  to  follow  sample  persons  leaving  in- 
stitutions even  if  they  do  not  rejoin  active 
SIPP  households.  There  are  no  current  plans  to 
track  sample  persons  who  move  outside  of  the 
country  or  to  an  Armed  Forces  barracks. 

Interviewers  may  return  to  an  address  in  the 
1984  panel  and  find  that  all  original  Wave  1 
sample  persons  have  left  but  one  or  more 
additional  persons  (who  joined  households  with 
sample  persons  after  Wave  1)  remain.  In  the  1984 
panel  no  interviews  are  conducted  at  that  address 
even  though  persons  currently  at  the  address 
lived  with  sample  persons  during  at  least  part  of 
the  reference  period.  For  future  panels  a  final 
interview  will  be  conducted  for  the  additional 
persons  remaining  at  the  address.  As  in  the  1984 
panel,  no  subsequent  follow-up  is  planned  for 
these  persons. 

As  described  earlier,  1n  the  1984  SIPP,  only 
persons  who  are  15  or  over  are  followed  to  new 
addresses;  sample  persons  who  are  under  15  years 
old  are  not  followed  unless  they  move  with  a 
sample  person  who  is  15  or  over.  However  once 
they  become  15  they  are  eligible  for  interview 
along  with  other  members  of  their  households. 
They  are  missed  1n  the  1984  panel  1f  they  move 
before  turning  15  and  are  not  accompanied  by  a 
sample  person  who  is  15  years  old  or  older.  Their 
absence  may  result  in  some  bias  1n  the  survey 
data.  In  future  SIPP  panels,  all  sample  persons 
who  are  12  years  old  or  older  at  the  time  of  the 
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first  interview  will   be  eligible  for  follow-up. 
When  a  person  who  was  12  years  old  at  the  time  of 
the  first  interview  moves  by  him-  or  herself  to 
a  new  address,  occupants  of  the  new  household 
will   be  interviewed  according  to  standard  pro- 
cedures—that is  persons  15  years  old  and  over 
will   be  administered  a  questionnaire.     When  the 
sample  person  turns  15,  that  person  will   also  be 
administered  a  questionnaire. 

A  number  of  other  recommendations  have  been 
made  for  future  SIPP  panels.     These  include: 

(1)  Reassigning  Wave  1  eligible  noninterviews 
in  Wave  2.     Interviewers  will   be  provided 
with  instructions  for  obtaining  household 
rosters  and  assigning  person  numbers 
retrospectively--!. e. ,  as  of  a  date 
approximately  four  months  prior  to  the  date 
of  the  second  interview. 

(2)  Adjusting  the  computerized  check-in  system 
to  allow  for  new  serial    numbers   (represent- 
ing persons  or  addresses)  to  be  introduced 
in  Wave  2.     This  will   provide  flexibility 
for  including  missed  Wave  1  housing  units. 

(3)  Developing  a  questionnaire  that   is  appro- 
priate for  telephone  interviews.     This 
could  be  administered  to  persons  who  are 
not  followed  for  a  personal    visit. 

(4)  Increased  automation  over  the  next  few 
years  will   eliminate  much  of  the  time 
consuming  clerical    operations  associated 
with  the  check-in,  control   and  monitoring 
of  assignments. 

In  summary,  SIPP  has  attempted  an  ambitious 
undertaking  by  implementing  and  attempting  to 
improve  an  extensive  follow-up  program.     Data 
users  will    be  the  ultimate  beneficiaries  and 
judges  of  the  program's  success. 


\J   U.S.  Bureau  of  the  Census,   "Geographical 
Mobility:   March   1981   to  March   1982."   Current 
Population  Reports,  Series  P-20,  No.  3BT: 
Issued  February  1984,  U.S.G.P.O. 
2/  U.S.  Bureau  of  the  Census,  op.  cit. 
7/  The  United  States  is  administratively  divided 
into  12  geographic  areas.     Each  area  consists  of 
a  group  of  states  under  the  jurisdiction  of  a 
census  regional   office. 

4/  Wave  2  interviews  for  households  not  origin- 
ally interviewed  in  Wave  1   require  special   proc- 
edures for  constructing  household  rosters.     For 
example,  interviewers  would  need  to  obtain  the 
names  of  persons  living  at  the  address  as  of  a 
reference  date  four  months  prior  to  the  Wave  2 
interview.     An  appropriate  Wave  1  person  number 
would  be  assigned  (see  the  SIPP   Identification 
System  explained  later  in  this  paper).     However 
the  1984  computerized  check-in  system  was  de- 
signed to  reject  any  Wave  1   person  number  that 
appeared  for  the  first  time  in   later  waves. 
5/  Code  4  is  used  in  circumstances  where  a  sample 
person  moves  to  an  address  already  occupied  by 
persons  not  previously  in  SIPP.     The  persons  not 
previously  in  SIPP  are  added  to  the  roster  and 
are  coded  "4." 

6/   It  was  decided,  not  to  obtain  proxy  inform- 
ation for  sample  persons   (as  well   as  other 
members  of  a  household  that  has  at  least  one 
resident  sample  person)  who  die  while  they  are 
in  a  SIPP  panel. 

7/  Robert  Warren  and  Jennifer  Marks  Peck, 
""Foreign-Born  Emigration  from  the  United  States: 
196U  to  1970,"   Demography,  Vol.   17,  No.   1 
(February  1980),  pp.   71-84. 
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MANAGING   THE  DATA  FROM  THE    1979   INCOME    SURVEY   DEVELOPMENT   PROGRAM 


Pat  Doyle,   Mathematics  Policy  Research 
Constance  F.   Citro,   National  Academy  of   Sciences 


During    1979  and    1980  the  Department   of  Healrti 
and   Human  Services  and   the  Bureau   of    the  Census, 
with  support   from  other   federal   government 
agencies   including   the  Food  and   Nutrition 
Service,    USDA,    administered  a   panel  study  of 
households   representative   of    the   civilian 
noninstitutionalized  population  in   the   United 
States   called   the    1979  Income    Survey  Development 
Program   (ISDP)  Research  Panel.      The   survey  was 
designed  as   the   final   pretest   for   the  Survey  of 
Income  and  Program  Participation   (SIPP)   which  had 
been  under  development   since    1975  and  was   fully 
implemented  in   late    1983.      The    1979   panel  study 
was  extremely  complex  due   to   the  efforts  put 
forth   to  improve  the  measurement   of   income,   net 
worth,    and   program  participation  and    to   increase 
the  information  available  on  behavior,   attitudes, 
expenses   and  disposable   income   of    the  population. 

The  complexity   of   the    1979  ISDP  survey  design 
led    to   the  production  of    public  use   files  which 
are  cumbersome   to  use   thus  making  it   difficult  to 
access   the  newly  available   data  for   research. 
The  subject   of   this   paper  is   to  describe  a   pro- 
ject  conducted  by  Mathematica  Policy  Research 
(MPR)   under  contract   to  the  Food  and  Nutrition 
Service,    USDA,    to   solve   the  data  access  problems 
through  the  use   of   data   base  management   system 
technology.      The  DBMS   chosen  for  this  work  was 
RAMIS  II"  developed  and  distributed  by 
Mathematica  Products  Group.      The  system  developed 
by  MPR  is  referred   to  as  the   ISDP/RAMIS  II 
system. 

It   is   important   to  note  that  a  number  of 
problems  that  were  confronted  in  designing  the 
access  system  described  in  this  paper  have  been 
resolved  in  the  release  of  the  public  use   ISDP 
files   (in  fact,    data  from   the  ISDP/RAMIS   II 
system  were  the   source  of   some   of  these  improve- 
ments).     Furthermore,   some,   but  by  no  means   all, 
of   these  access  problems  have  been  explicitly 
taken  care  of   in  the  design  of    the  SIPP.      Conse- 
quently,   designing  an  access  system  for  the  new 
survey  should  be  easier  than  for  the  ISDP.      It  is 
also  true  that  the  best   design  for  a  SIPP  access 
system  is  likely  not   to  be   the  design  chosen  for 
the  ISDP  system. 

In  the  subsequent   section,   an  overview  of   the 
panel  study  with  emphasis  on  the  contents  and 
problems  of   the  data  files   is  provided.     The 
report  concludes  with  an  overview  of  the  newly 
created   system  with  a  summary  of    the  data  prob- 
lems  solved  in  the  course   of  this  work.      For 
detailed   information  on  the  contents  and  use   of 
the  ISDP  system,    the  reader  is  referred   to  Doyle 
and  Citro   (1984). 

Overview  of    the   ISDP  and   Its  Applications 

Figure   1   gives   a  graphic    representation  of    the 
key  features   of   the  ISDP  design.      Briefly,   note 
that: 

—  There  were  6  waves  of   interviewing  provid- 
ing  12   to   15  months   of   data  for  each 
household. 


—  Interviewing  was   staggered;    one-third  of 
the   sample  was   interviewed  each  month, 
with,    thus,    a  different    3-month  reference 
period   for  each  rotation  group. 

—  This  pattern  was  regular,  except  that  the 
third  rotation  group,  for  various  reasons, 
was  skipped  over  Wave   4. 

—  Each  wave  asked  a  core   set   of   items, 
including  monthly   income   and   employment, 
plus  one-time  supplemental   items. 

The  SIPP  design   for   the   first   panel    is  very 
similar,    including  skipping  one  wave   for   part   of 
the  sample. 

The  ISDP,   by  virtue  of   gathering"  detailed 
month-by-month   data  over  a  span  of  at   least  a 
year,   offered   the  potential   for  exciting   research 
that   simply  could  not  be  carried  out  before. 
But,    to  make   it  possible   for  the   researchers   at 
MPR  to  realize  that  potential,   we  had   to  design 
an  access  system  that  would  do  the  following: 

—  Generate  reports  and  analysis  files  from 
individual  waves,  undoubtedly  the  easiest 
way  of  using  the   data 


■  Let  researchers  apply  different  rules  to 
identify  households  and  families  across 
waves  for  longitudinal  analysis 

■  Link  supplemental  data  collected  in  one 
wave   to  core  data  in  other  waves 

■  Make  it  possible  to  carry  out  sophisticated 
statistical  as  well  as  tabular  analysis  of 
the  data 

■  Make  it  possible  to  use  the  ISDP  data  with 
data  from  other  sources,  for  example,  1980 
census  summary  data. 


All  of   these  access  require 
well  to   the  SIPP. 


tents  apply  equally 


Problems   for  Access  Posed   by   the   ISDP 

Various  design   features   of    the   ISDP  posed  more 
or   less   serious   problems   for  developing  an  access 
system  that  would  satisfy   the   requirements   just 
listed.     These  are  summarized  below. 

o     Staggered   Interviewing.      The  use  of   a 

staggered  interviewing  schedule  results  in 
a  situation  where  data  from  more  than  one 
interview  must  be  accessed   to  study  a 
common  calendar  period  for  the  entire 
sample   (except  where  the  user  can  make  do 
with  the   single   calendar  month  that   is 
common  to  all  rotation  groups  within  a 
wave). 

o     Skipping  Wave   4.     The  alteration  of   the 
interviewing  schedule   to  have  the  third 
rotation  group  skip  over  the  Wave  4  inter- 
view means  that,   although  two-thirds  of   the 
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sample    cases    have   a   full    15   months  of    data 
(from  the   five   regular  waves   if   they  did 
not   attrite),    the   other   third  has  only   12 
months.     Moreover,    the  third   rotation   group 
does   not   have   responses    to  any  of    the 
topical  supplemental  items  asked  at  Wave  4. 

o     Different   reference   periods   for  wave- 
specific    information.      For  any  one 
interview,    there   is  a   potential  mismatch 
between  the  wave-specific   data  and    the 
monthly   data,    given  that   monthly   data  for 
the  month  of   an  interview  were  actually 
asked  at   the  subsequent   interview. 

o     Identifier  problems.      The  Census   Bureau 
encountered   problems   in  uniquely   identi- 
fying  individuals   across   the   survey  waves, 
necessitating  creation  of   a  new  unique 
person  identifier,    called   the   link    index, 
as  a   separate  file   from  the   interview  data 
files.      It   also   turned   out   that  the   Bureau 
erroneously  included   some   persons  on  the 
cross-section  interview  files  who  were  not 
in  fact   present  and  vice   versa. 


One-time  ; 


-specific   supplemental   data. 


The  fact   that   important   data  were  asked  on 
a   supplemental   one-time   basis   creates 
problems   for  using  these   items   together 
with  the  monthly  and  quarterly  data. 

o     School  lunch  data  problems.  The   ISDP  files 
include   valid   data  only  for  the  last   child 
in  a  family,    and  these   data  were  erroneous- 
ly written  into   the  records  for  all  other 
children. 

o  Lack  of  editing  on  Wave  6.  In  the  case  of 
Waves  1-5,  the  Census  Bureau  performed 
edits  on  demographic  variables  and  also 
edited  income  recipiency  flags.  No  editing 
was  performed  on  the  Wave  6  data,  which 
were  collected  in  an  entirely  different 
format. 

o  Asset  income  reporting  experiment.  This 
experiment  creates  practical  problems  of 
associating  asset  income  data  with  other 
data   for  each  month   of   the   panel. 

o     Incomplete  determination  of   monthly  unit 
composition.      The  design   of    the  cross- 
section  files,    coupled  with  a  high  level  of 
noise  in  the  data  on  arrival   and   departure 
dates,   made  it   very  difficult   to  assemble  a 
stream  of  monthly  unit  composition  indi- 
cators consistent  with  reported  monthly 
economic   data. 

o     Absence  of    longitudinal  weights  and 

imputations    for  missing   data.      The  cross- 
section  interview  files  contain  weights  and 
also   imputations   for  missing   income  and 
employment   data  that  were  constructed 
strictly  on  a  cross-section  basis  which  are 
not  suitable  longitudinal  studies. 

o     Absence  of    longitudinal   editing.      With  the 
exception  of   editing  age  and   sex  in  the 


construction  of    the   unique    identifiers,    no 
longitudinal    edits   were    performed   on   the 
demographic   variables. 

These   characteristics   of    the   ISDP  survey  make 
retrieval   of    the   information  for  analysis  cumber- 
some and   expensive.      This   is   particularly   true 
for  longitudinal   applications   of    the  data  .such  as 
the  study   of   turnover   in  the  Food  Stamp  Program. 

The   difficulty    in  using   the   ISDP  for  research 
was   compounded  by   the   structure   of   the  available 
data  files.      At   the   time   this   project   was   carried 
out,    the  most   suitable   input    file  was  a  conca- 
tenation of    cross-section  files   from  all   five 
waves.      The   format    for  each  cross-section  was 
similar  to    the   public  working   files   currently 
available   (NTIS,    1982)    except   that   the   family 
level   had   not    been  fully  developed.      The   records 
from  all   five  waves  were   grouped   by  PSUSER1AL  and 
a   level    1   record  was   created  which    recorded 
information  common   to  each  group  such  as   rota- 
tion.       In  addition   to   inserting   the   level    1 
record,    the  Bureau  also  merged  the    link  index 
(constructed  unique   person   identifier)   and   longi- 
tudinal  edited  values   of   age  and   sex   to   this 
file.      However,    the   Bureau  deleted   from  this   file 
the   results   of   the  cross-sectional   imputations 
for  income   and   employment   data.      The   rationale 
for  this   omission  was   the   unsuitability  of   these 
imputations   for  longitudinal  analysis,    the  pur- 
pose  of   the  concatenated  file. 

This   file  was  extremely  cumbersome   to  access 
due   to  the   lack   of   a   true  hierarchical  structure, 
the   large   number  of   different    record   types   (data 
from  each   topical  module  were  recorded  on  a   sepa- 
rate record  with  a  distinct    record   length  and 
layout)  and   the   fact   that   some  of   the  newly 
created   person  identifiers   were  erroneous. 

Overview  of   the   ISDP/RAMIS   II   System 

The  objective   of   this   data   base   development 
effort,    as   noted   above,   was   to   take   the   infor- 
mation available  on  the   series   of  cross-6ection 
files  described  above   and  array   it   in  a  manner 
that  would   facilitate   longitudinal  as  well  as 
cross-sectional   analysis.      The   results  of    this 
effort  were   two  RAMIS   II   data   bases,    one   called 
SIPPMASTER  and  one  called  MH  for  monthly  house- 
holds.     SIPPMASTER  is   the  main  file   in  that   all 
of    the  data  collected   during   each  wave   are  stored 
there.      This   file  is  used   for  all  cross-section 
applications   as  well  as   longitudinal  applications 
which  do  not   involve   the   formation  of   longitudi- 
nal  households   or  other  groupings   of    indivi- 
duals.     The  MH  file   is   the   data   base   designed  to 
support   the  construction  of    longitudinal   units. 
It  essentially   provides  information  on  monthly 
household,    family,    and   food  stamp  unit  compo- 
sition.     The   data  in  MH  are  arrayed   to  permit   a 
user  to  develop  a  definition  of    longitudinality 
and  apply  that   in  the  construction  of  a  longi- 
tudinal unit  file.      Once   the   longitudinal  unit 
itself  is   determined,    the  user  can  employ  the 
data  stored   in  SIPPMASTER  to  derive  variables 
like   total  household  monthly  income  which  reflect 
the   longitudinal  unit   characteristics. 

The   remainder   of   this   section  provides  an 
overview  of    the  contents  of    the   ISDP/RAMIS    II 
system.      A  detailed  discussion  of   the  motivation 
for  choosing   this   file  design  and   the  procedures 
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required  to  develop  this  data  base  is  described 
in  Doyle  and  Citro  (1982). 

S1PPMASTER.   Figure  2  displays  the  logical 
organization  of  S1PPMASTER.   It  has  a  hierarch- 
ical structure  with  fifteen  levels,  five  of  which 
are  real  and  ten  of  which  are  virtual.    Th«  five 
real  levels  are  wave,  household,  family,  person 
and  month.   Some  relevant  comments  on  each  of 
these  levels  follow: 

Wave.   (Level  1)  Indicators  for  Waves  1 
through  5  are  contained  in  SIPPMASTER  on 
level  1.   The  data  from  Wave  6  are  treated 
as  supplemental  and  are  therefore  stored  in 
the  virtual  level  PM  (level  7).   SIPPMASTER 
is  physically  separated  into  5  data  bases, 
one  for  each  wave.   They  are  linked  to- 
gether with  RAMIS  II  USE  commands  to 
logically  form  one  data  base. 

Household.   (Level  2)  This  reflects  house- 
hold composition  at  the  time  of  the  inter- 
view.  The  household  identifier  (HHID) 
uniquely  identifies  households  within 
wave.   It  cannot  be  used  to  identify 
households  longitudinally.   Non-interview 
households  in  each  wave  have  entries  at 
this  level,  however  data  for  all  other 
levels  are  zero.   The  contents  of  the 
household  level  consist  of  the  data  found 
in  the  household  record  in  the  cross- 
section  files  prepared  by  the  Census 
Bureau. 

Family.  (Level  4)  The  family  level  simply 
identifies  family  units  within  households 
as  they  existed  at  the  time  of  each 
interview.  Primary  individuals,  secondary 
individuals,  and  outmovers  are  treated  as 
one  person  families. 

Person.   (Level  5)  This  contains  interview 
specific  data  for  each  individual  and 
retrospective  data  that  were  not  collected 
for  specific  calendar  months  such  as  total 
weeks  unemployed.   The  identifier  for  level 
5  is  the  link  index  (called  PERID  in  RAMIS 
II)  so  that  each  person  sampled  is  identi- 
fied in  the  same  way  across  all  waves.   The 
data  for  the  person  level  were  derived  from 
record  type  5  of  the  cross-section  files. 
Some  relevant  points:  outmovers  in  a  given 
wave  are  included  for  that  wave  but  have  0 
in  the  weight  fields;  the  weights  are 
cross-sectional;  all  person  identifiers 
with  values  exceeding  200000  should  be 
deleted  for  longitudinal  analysis  but  not 
for  cross  sectional  analysis;  corrected  age 
(C0RAGE)  should  be  used  instead  of  edited 
age  (AGEED)  except  that  corrected  age  is  0 
on  Wave  2:  Income  recipiency  flags  on  level 
5  are  not  to  be  used  to  determine  item  non- 
response  as  they  were  retained  here  for 
other  reasons  (for  example,  if  the  interest 
flag  in  Wave  1  is  1  on  level  5  but  there  is 
no  entry  for  that  income  type  in  the  WU  or 
MU  associated  files,  then  the  person  was 
reported  to  have  had  an  interest  producing 
asset  but  did  not  actually  receive  interest 
income  during  the  Wave  1  reference  period). 


Month.   (Level  12)  This  represents  the 
reference  period  for  each  wave.   All  months 
in  the  survey  have  been  numbered  longitudi- 
nally so  that,  for  example,  the  3  months 
pertaining  to  Wave  2  are  4,  5,  and  6. 
Aside  from  identifying  the  longitudinal 
reference  months,  this  level  contains 
numerous  fillers  intended  to  support  the 
construction  of  longitudinal  household  (or 
other  aggregate  unit)  files. 

The  remaining  data  available  through  SIPP- 
MASTER are  stored  in  associated  files  which  can 
be  accessed  directly  if  desired.   A  summary  of 
the  contents  of  each  can  be  found  in  Doyle  and 
Citro  (1984). 

MH.   Figure  3  describes  the  logical  organiza- 
tion of  MH.   It  is  a  relatively  simple  hierarchi- 
cal file  with  five  real  levels  and  ojie  virtual 
level.   Thi6  file  reflects  the  outcome  of  a 
complicated  procedure  designed  to  determine 
monthly  household  and  food  stamp  unit  composition 
from  the  data  collected  in  the  1979  ISDP  Research 
Panel.   Documentation  on  the  methodology  employed 
in  the  development  of  this  file  is  included  in 
(Doyle  and  Citro,  1984).   The  contents  of  this 
file  are  described  below  followed  by  a  section 
summarizing  how  it  is  used  to  develop  longi- 
tudinal units. 

Unlike  SIPPMASTER,  MH  contains  a  limited 
number  of  variables.   It  is  comprised  mostly  of 
pointers  detailing  who  lived  with  whom  during 
each  month  covered  by  the  first  five  waves  of  the 
survey.   The  remaining  variables  provide  descrip- 
tive characteristics  such  as  age  and  relationship 
to  reference  person  which  are  necessary  to 
effectively  determine  longitudinal  units.  Each 
of  the  levels  of  MH  is  described  below. 


PSUSERIAL.  (Level  1)  This  level  contains 
the  scrambled  values  PSUSERIAL  as  well  as 
the  rotation  group  identifier.   For  the 


ISDP  all  persons  who  ever  resided  together 
have  common  values  of  PSUSERIAL,  so  this 
level  was  created  to  increase  the  effi- 
ciency of  data  retrieval  and  to  minimize 
storage  costs. 

MONTH.   (Level  2)  This  level  simply 
identifies  the  month.   Longitudinal 
reference  months  as  described  for  SIPP- 
MASTER were  used.   For  rotation  groups  1 
and  2,  the  months  range  from  1  to  16  and 
for  rotation  group  3  they  range  from  1  to 
13.   Note  that  household  composition  can  be 
described  for  one  more  month  than  is  cover- 
ed by  the  retrospective  data  collected  in 
the  ISDP.   This  extra  time  period  is  the 
month  of  the  final  interview. 

Household.  (Level  3)  This  level  describes 
who  lived  with  whom  during  each  month  and 
the  Food  Stamp  Program  participation  and 
benefits  of  that  group.  The  contents  are 
the  monthly  household  identifier  and  food 
stamp  recipiency  and  amount  variables  for 
up  to  two  food  stamp  units. 
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Family.   (Level  4)  This  is  an  indication  of 
family  groupings  within  monthly  house- 
holds.  The  contents  are  family  identifier, 
family  type,  and  family  kind. 

Person.   (Level  5)  This  level  contains  an 
entry  for  each  person  for  every  month  h«  or 
she  was  present  in  the  sample.   The  key  to 
this  level  is  PER1D,  the  same  identifier 
used  in  S1PPMASTER.   The  other  variables 
stored  in  this  level  are  age,  relationship 
to  reference  person,  marital  status,  food 
stamp  unit  membership  and  variables  neces- 
sary to  link  to  S1PPMASTER. 

PP.   (Level  6)  This  is  a  virtual  level  in 
MH.   The  associated  file  is  called  PD  and 
it  is  the  same  PD  file  accessed  through 
level  6  of  SIPPMASTER.   It  contains 
presence  in  sample  indicators  as  well  as 
constant  demographic  data  such  as  sex. 

The  intended  use  of  the  MH  data  base  is  to 
determine  longitudinal  units.   In  developing  the 
ISDP/RAMIS  II  system,  one  objective  was  to  allow 
researchers  flexibility  in  the  development  of  the 
definition  of  what  constitutes  the  same  unit  when 
viewed  over  time.   For  some  applications  it  may 
be  appropriate  to  define  a  unit  as  being  the  same 
from  one  month  to  the  next  if  all  adults  remain 
the  same.   For  another  application  it  may  be 
sufficient  to  only  require  that  the  reference 
person  (household  head)  be  the  same.   More 
complicated  definitions  may  be  required  in  other 
situations.   An  example  might  be  that  units  are 
the  same  if  the  composition  changes  from  one 
month  to  the  next  are  restricted  to  birth  of  a 
child,  loss  of  a  peripheral  adult,  e.g.,  an  older 
daughter  leaves  for  college,  or  a  death  of  one 
spouse  in  a  husband-wife  primary  family. 

Each  of  these  three  definitions  can  be  speci- 
fied with  the  ISDP/RAMIS  II  system  as  can  many 
others.   The  procedure  is  as  follows.   Using  the 
preferred  definition,  an  algorithm  for  uniquely 
identifying  each  unit  each  month  is  developed. 
In  the  second  example  above,  this  would  simply 
involve  assigning  the  PERID  of  the  reference 
person  to  the  monthly  unit  as  the  identifier. 
Next,  a  comparison  across  months  within  PSUSERIAL 
groups  is  made.   All  monthly  units  with  common 
values  of  the  newly  created  identifier  constitute 
one  longitudinal  unit.   Finally,  an  extract  is 
created  which  records  the  available  information 
organized  by  the  longitudinal  unit  identifier. 

The  available  data  from  MH  are  primarily 
demographic,  the  exception  being  Food  Stamp 
Program  characteristics.   The  user  will  of  course 
also  desire  economic  data  to  support  the  analysis 
of  the  longitudinal  units.   This  can  be  achieved 
through  the  extraction  of  data  from  SIPPMASTER. 

Conclusion 

This  paper  describes  a  system  to  access  data 
from  a  complicated  longitudinal  survey  of  house- 
holds when  the  survey  itself  was  in  its  devel- 
opment stages.   It  represents  a  successful 
attempt  to  apply  modern  DBMS  technology  to  solve 
access  problems  posed  by  complex  social  science 


data  collection  efforts.   Some  of  its  feature 


o  Procedural  language  interface  to  allow 
the  use  of  FORTRAN  or  PL/1  to  conduct 
complex  applications 

o  SAS  interface  to  permit  more  sophisti- 
cated statistical  analysis. 

The  system  is,  of  course,  not  without  draw- 
backs.  For  example,  the  hierarchy  imposed  in  the 
primary  file,  SIPPMASTER,  is  cumbersome  and,  with 
recent  developments  in  relational  data  base 
technology,  unnecessary.   This  structure  could 
easily  be  simplified  today.   Furthermore,  the 
system  is  on-line  and  therefore  require  large 
amounts  of  disk  storage.   As  the  cost  of  mass 
storage  goes  down  with  improved  hardware  now 
being  developed,  this  will  become  less  of  a 
problem. 

In  spite  of  these  imperfections  the  ISDP/RAMIS 
II  system  works.   It  represents  the  first  truly 
integrated  ISDP  data  base  available  to  the  public 
for  research.   With  this  system  users  can  and 
indeed  have  carried  out  analyses  that  truly 
exploit  the  longitudinal  nature  of  the  data. 


FOOTNOTES 


In 


A0n  the  publicly  available  data  bases,  PSU- 
SERIAL is  a  nine  character  field  which  uniquely 
identifies  all  households  in  Wave  1.   Together 
with  person  number  it  was  originally  intended  to 
uniquely  identify  persons  followed  in  the  panel. 

A  virtual  level  is  a  level  for  which  the  data 
are  not  physically  stored  in  the  file.   Instead 
there  is  an  internal  record  of  the  location  of 
another  file  which  contains  the  information. 
With  a  DBMS,  this  other  (or  associated)  file  is 
accessed  automatically  when  data  from  it  are 
requested. 
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of  data  collection 


search  Center 

mbers  for  all  Individuals,  sample  an 
nsample,  and  (4)  storing  the  data  fo 
e  individuals  and  aggregation*  r,f 


and        individuals  so  that  an  analys 
-erform  a  variety  of  analyses 


Program  Participation  (SIPP)  and  its 

elaborate  pretest,  the  1979  Income  Survey       individuals  in  an  efficient 
Development  Program  (ISDP).   Both  must  .??fL°[   S  i    "!!"   Pt 

address  issues  arising  from  the  basic 


design  of  longitudinal  surveys  of 


utlined  by  Jean  and  McArthur  are  quite 
imilar  to  the  ones  that  have  been  used 


Individuals  and  households  and  it  is  successfully  by  the  PSID  for  17  years. 


nts  abou 


worth  beginning  with  a  brief  review  of 

the  sampling  theory  behind  the  SIPP 

design.   Since  this  design  is  so  similar 

to  the  one  used  in  the  longitudinal  study 

with  which  I  am  most  familiar— the  Panel        alonS  as  a  Part  of  SIPP  households 

Study  of  Income  Dynamics  (PSID)— I  will         the  PSID.  individuals 


1.   Not  all  individuals  who  are 
institutionalized  appear  to  be  carried 


vily  from  the  17-year  history  of 
that  study. 

Many  cross-sectional  surveys  obtain 


and  cannot  be 
erviewed  are  associated  with  a  sample 
ily  for  as  long  as  they 


heir  samples  of  individuals  and  institutionalized.   Of  course  they  a 

seholds  by  sampling  dwellings.  not  considered  part  of  the  family  fo 


ablishing  contact  with 
if  they  leave  the  institution.   It  may 
tempting  to  drop  institutionalized 
individuals  from  the  sample,  but  there 
are  a  substantial  number  of  them, 


Longitudinal  surveys  can  do  this  as  well,  most  Purposes,  but  the  family  pr 

as  evidenced  by  the  procedures  of  both  wlth  the  means  of  tracking  the 
the  SIPP  and  PSID.   Representative 
samples  of  dwellings  provide 
representative  samples  of  subunits  within 
those  dwellings — households,  families, 
Food  Stamp  recipiency  units,  AFDC 

recipiency  units  and  individuals.   The  especially  at  younger  adult  ages.   A 

selection  probabilities  of  each  of  these  strategy  of  dropping  institutionalized 

subunits  are  identical  to  the  selection  individuals  in  a  country  with  a 

probabilities  of  the  dwelling  Itself.  compulsory,  universal  military  service, 

With  a  properly  specified  set  of  rules  would  result  in  all  young  people  being 

regarding  the  definition  of  units  and  the  dropped  from  the  sample!   Not  keeping 

tracking  of  those  units  over  time,  a  track  of  y°un8  children  who  move  into 

longitudinal  study  such  as  the  SIPP  or  Institutionalized  housing  of  various 

the  PSID  can  maintain  a  representative  types  or  with  relatives  who  are  not 

sample  of  each  of  the  various  subunits  sample  members  means  that  the  SIPP  will 

over  time.   This  requires  that  newly  be  ""able  to  inform  analysts  about  such 

formed  subunits  of  interest  (families,  children.    (The  PSID  does  not  follow 

AFDC  recipiency  units,  etc.)  enter  into  these  voun8  children  either.)   They  may 

the  sample  with  known  selection  be  to°  expensive  to  follow,  but  the 

probabilities  in  order  to  reflect  decision  of  not  following  them  should  be 

corresponding  changes  that  are  taking  based  on  an  appreciation  of  the 


opulation  at  large 


sequenc 


requires  that  individuals  be  classified  2>   Model-based  statisticians  may  not 

as  either  "sample"  or  "nonsample "  and  appreciate  the  distinction  between  sample 

that  explicit  rules  be  followed  and  nonsample  individuals  and  will  lament 

the  fact  that  nonsample  individuals  are 


nsistently  in  the  event  of  dramatic 
anges  in  the  composition  of  units.   In 
e  SIPP,  as  in  the  PSID,  for  example, 

individuals  who  join  the  sampl 


dropped  by  SIPP  once  they  leave  sample 
households.   The  PSID  does  not  follow 

mple  individuals  either,  but  perhaps 


through  marriage  are  followed  only  as  this  is  a  mistake.   Some  methodologi 


long  as  they 
household  containing 


Once  they  regain  their  independence  from        significant  diffe 


onducted  by  Finis  Welch  and  his 
lleagues  on  the  PSID  has  detected  no 


ey  regain  their  independence  from 
all  sample  members,  they  are  no  longer 
followed. 

samplln 


behavioral  models  estimated  for  sample 
and  nonsample  Individuals.  (Becketti, 
al.  1983.) 


siderations  require  that  the  study  3'   The  "onresponse  rules  for  the  SIPP 

are  not  entirely  clear  from  the  Jean  and 
McArthur  paper,  especially  the  rules 
they'go,  (2)  allowinglndi'viduais  to'join       regarding  attempts  to  contact 
the  sample  to  provide  accurate  nonres pond  en t s  to  waves  subsequent  to  the 

first  one.   The  PSID  does  not  attempt  to 
recontact  these  nonr espondent s  and  I 


equire  that 

good  systems  for  (1)  tracking  

sample  Individuals,  regardless  of  where 


information  about  the  household  in  which 
sample  individuals  reside,  (3)  having  a 
fool-proof  system  of  identification  'SfS.X'lS  !?!!  'Vf!  M88 


PSID  design.   Evidence  from  the  ne' 
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youth  cohorts  of  the  NLS  indicates  that 
nonrespondents  in  one  wave  are  often 
quite  willing  to  respond  to  subsequent 
waves.   One  gets  the  impression  that 
refusals  or  contact  difficulties  are 
often  quite  transitory  in  nature. 

A.   The  Jean  and  McArthur  paper 
mentions  but  does  not  emphasize  the 
importance  of  obtaining  the  name,  address 
and  phone  number  of  a  contact  person  who 
might  know  the  whereabouts  of  sample 
households  if  they  move.   More 
conventional  means  of  following 
individuals  such  as  through  forwarding 
addresses  sometimes  do  not  work  precisely 
because  the  individuals  do  not  wish  to  be 
followed  easily.   In  our  experience,  the 
contact  information  is  invaluable. 

5.   Telephone  interviewing  is 
mentioned  as  a  possible  way  of  preserving 
high  response  rates.   The  PSID  experience 
suggests  that  this  is  indeed  true  and 
that  data  quality  does  not  suffer  unduly 
from  switching  interviewing  modes. 
Indeed,  a  substantial  number  of  recontact 
calls  are  made  to  PSID  respondents  to 
clean  up  unclear  interview  information. 
Telephones  also  provide  a  means  of  not 
only  reaching  geographically  remote 
respondents  but  also  respondents  whose 
time  schedules  make  telephone  interviews 
much  more  likely  to  succeed. 

Before  turning  to  the  subject  of  the 
Doyle  and  Citro  paper  I  would  like  to 
make  a  comment  on  the  interaction  between 
the  data  collection  and  data  management. 
Too  often  we  compartmentalize  the  two 
without  realizing  how  intimately  they  are 
related.   As  illustrated  in  the  Doyle  and 
Citro  paper,  data  analysts  often  discover 
apparent  inconsistencies  or  outright 
errors  and  are  in  the  worst  position  to 
make  an  informed  judgement  about  the 
problems.   Data  collectors  ought  to 
anticipate  problems  of  this  sort  and  have 
significant  resources  allocated  to 
solving  them.   Most  of  the  problems  must 
be  resolved  by  returning  to  the  original 
protocols,  at  least  briefly,  to 
understand  the  nature  of  the  problem. 

What  now  of  the  data  structure  and 
methods  proposed  in  the  Doyle  and  Citro 
paper?   Several  basic  questions  come  to 
mind. 

1.  The  most  basic  question  to  be 
asked  of  any  proposed  data  structure  is 
"Is  it  feasible?"   That  the  proposed 
structure  has  been  used  with  success  for 
several  ISDP  projects  suggests  an 
affirmative  answer  to  this  question. 

2.  The  second  question,  more 
difficult  to  answer  from  the  information 
contained  in  the  paper,  is  "Is  it 
efficient?"  or,  more  properly  stated, 
"Under  what  circumstances  is  it 
efficient?"   Does  one  need  a  dedicated 
machine  capable  of  grinding  away 
throughout  the  night  to  select  an 
abstract  from  the  data  set  with  this 


in  which  CPU  is  priced  at  its  marginal 
cost?   I  suspect  that  the  proposed  system 
is  not  very  efficient  in  the  latter  type 
of  computing  environment  but  I  could  not 
tell  from  the  information  contained  in 
the  paper. 

3.   Since  most  "computing"  costs  are 
the  labor  costs  of  the  programmers  and 
other  analysts  rather  than  the  machine 
charges,  the  third  question  is  "Is  it 
easy  to  use?"   Apparently  once  one  has 
acquired  a  great  deal  of  specific 
training  about  the  proposed  system,  it  is 
fairly  straightforward.   But  outside 
analysts  are  encouraged  to  consider 
avoiding  the  data  abstracting 
complications  by  delegating  that  work  to 
those  who  are  more  familiar  with  the 
system. 

The  data  structure  that  is  proposed  is 
modelled  after  the  exceedingly  complex 
file  structure  used  by  the  Census  Bureau. 
Surely  there  is  a  simpler  method  than  an 
eight-level  hierarchy  for  each  wave  and 
four  files  each  with  a  fifteen-level 
hierarchy  and  a  completely  separate  six- 
level  hierarchy  that  can  be  used  to  sort 
out  different  aggregations  of 
individuals.   The  PSID  files  are  more 
complicated  in  that  they  have  more  waves 
of  data  but  are  simpler  in  that  they  are 
in  only  one  aggregation — the  family.   It 
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family  history  and  the  individual.   The 
term  "family  history"  is  chosen  with  care 
because  a  major  insight,  obvious  now  but 
not  during  the  first  twelve  years  of  the 
study,  are  the  data  structure 
implications  that  stem  from  the  fact  that 
not  all  individuals  in  a  given  family  in 
the  most  recent  wave  share  the  same 
"family  history".   In  fact,  we  have  about 
seven  thousand  current  families  but  over 
nine  thousand  family  histories.   The 
first  level  of  the  hierarchy,  then,  is 
the  family  history;  the  second  level 
consists  of  the  individual  histories  of 
all  of  the  individuals  who  share  that 
same  family  history.   One  could  also 
construct  "household  histories",  "Food 
Stamp  recipiency  histories",  etc.  as 
additional  hierarchical  levels  or  as 
separate  records  in  a  networking  data 
structure.   These  simpler  hierarchies 
require  that  some  of  the  information  from 
the  individual  data  record  be  aggregated 
into  the  family  or  household  record  and 
this  work  is  probably  best  done  at  the 
Census  Bureau  rather  than  having  outside 
analysts  attempting  to  do  this  with  the 
information  they  have  at  their  disposal. 
A  final  comment  concerns  a  limitation 
again  attributable  to  the  way  in  which 
the  Census  Bureau  processes  its  data 
rather  than  to  the  organizations  such  as 
Mathematica  Policy  Research  that  attempt 
to  make  sense  out  of  it.   Implicit  in  the 
file  structure  is  the  assumed  need  to 
aggregate  individuals  into  households  or 
other  sensible  unitr,  but  not  the 
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possible  need  Co  relate  individuals  to 
one  another.   One  could  think  of  a  file 
in  which  all  sample  individuals  had  data 
records  that  contained  information  on  all 
Individuals  who  had  been  or  were  about  to 
be  related  to  them  in  some  way  (by  blood, 
marriage,  adoption,  or  sharing  the  same 
dwelling).   Information  on  the  related 
individuals  would  include  wave  by  wave 
(or,  in  the  case  of  SIPP  and  ISDP,  month 
by  month)  information  on  how  the 
individuals  were  related  and  whether  they 
shared  the  same  dwelling,  family, 
household,  Food  Stamp  recipiency  unit, 
etc.   For  most  purposes  thi6  would  be  the 
most  general  file  structure  for  SIPP, 
enabling  the  analysts  to  distinguish  step 
children  from  natural  children,  ex- 
spouses,  and  other  relatives  so  that  one 
could  analyze  the  economic  consequences 
of  divorce,  etc.   This  would,  of  course, 
require  a  great  deal  more  information 
than  is  now  currently  provided  in  the 
Census  Bureau's  current  "relation  to 
head"  coding.   But  the  added  detail  would 
enable  the  construction  of  a  file 
structure  that  would  be  of  greatest  use. 
References 
Becketti,  Sean;  Gould,  William; 
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The  papers  by  J.  C.  Moore  and  D.  Kasprzyk,  A. 
M.  Roman  and  D.  V.  O'Brien,  and  R.  A.  Kulka  are 
important  efforts  to  examine  sources  of  error 
other  than  that  due  to  sampling.   I  would  hope 
that  this  concern  with  nonsampling  sources  of 
variance  continues  throughout  the  program  of 
evaluation  and  research  on  the  new  SIPP,  and  that 
the  results  are  reflected  in  reports  coming  out 
of  the  SIPP.  There  is  ample  reason  to  suspect 
that,  in  a  survey  as  complex  and  difficult  as  is 
the  SIPP,  error  due  to  sampling  will  be  swamped 
by  error  due  to  miscommunication  between  respon- 
dent and  interviewer,  respondent  dissembling, 
respondent  ignorance,  etc. 

The  Moore  and  Kasprzyk  paper  tackles  the  prob- 
lem that  the  ISDP  (the  pilot  study  for  the  SIPP) 
can  be  seen  as  having  measured  more  change  be- 
tween waves  than  within  waves.   The  authors  argue 
persuasively  that  this  cannot  be  reflective  of 
true  conditions,  citing  a  number  of  reasons  for 
thinking  that  not  only  did  respondents  make 
errors  that  resulted  in  the  appearance  of  change 
between  waves  but  also  that  post-interview  pro- 
cessing may  have  contributed  significantly  to 
this  phenomenon. 

Their  analysis  heavily  depends  on  the  correct 
matching  of  persons  across  waves.   An  earlier 
analysis,  by  Graham  Kalton  and  James  Lepkowski, 
had  depended  on  a  file  that  contained  many  iden- 
tifiers with  erroneous  codes,  making  matches  very 
prone  to  error.   Moore  and  Kasprzyk  used  instead 
a  "definitive"  data  file  produced  by  Mathematica, 
which  linked  data  for  the  first  five  of  the  six 
waves.   The  authors  assert  that  this  "apparently" 
corrected  the  matching  problems  by  correcting 
person  identifiers.   This  is  a  curiously  ambigu- 
ous way  of  describing  what  must  be  one  of  the 
central  assumptions  of  the  analysis.   If  it  is 
not  now  possible  for  the  Bureau  of  the  Census  to 
construct  its  own  matched  file,  then  I  suggest 
that  a  second-best  alternative  is  to  document 
the  matching  process  used  by  Mathematica  and  to 
obtain  an  independent  validation  of  that  process. 
This  is  particularly  important  given  the  research 
now  beginning  on  the  ISDP,  with  support  from  NSF 
and  other  agencies. 

The  authors  do  point  out  the  intriguing  possi- 
bility that  the  matched  file  processed  by  Kalton 
and  Lepkowski  contained  a  higher  proportion  of 
correct  matches,  although  fewer  matches  in  all 
than  does  the  Mathematica  file.   Reporting  that 
even  a  low  rate  of  mismatching  can  produce  the 
level  of  between-wave  change  observed  (on  the 
basis  of  a  computer  simulation) ,  Moore  and  Kas- 
przyk raise  the  question  that  post-interview  pro- 
cessing may  have  played  a  large  role  in  the 
results  obtained. 

The  paper  points  indirectly  to  one  of  the  big 
methodological  questions  facing  the  SIPP:  when 
data  are  not  only  collected  longitudinally  (as  in 
the  CPS  and  the  SIPP) ,  but  are  also  to  be  analy- 
zed longitudinally  (as  in  the  SIPP) ,  it  becomes 
necessary  to  examine  and  perhaps  rethink  esta- 
blished procedures:  editing  for  consistency  with- 
in records,  imputation  for  item  nonresponse,  sub- 
stitution of  persons  for  missing  responses,  sam- 
ple weighting  across  time,  etc.   Despite  years  of 
experience  with  the  PSID  and  the  NLS,  for  example, 


there  are  not  yet  widely-accepted  solutions  to 
such  problems.   With  the  SIPP  demanding  solutions 
it  is  imperative  to  undertake  research  now. 

The  Roman   and  O'Brien  paper  focuses  on  one 
experiment  of  the  ISDP,  the  comparison  of  data 
obtained  from  college  students  living  away  from 
home  with  data  obtained  from  proxies,  usually 
their  parents.   The  experiment  was  conducted  dur- 
ing November  and  December,  certainly  not  the  best 
months  to  find  students  resident  at  school. 

Facts,  fate,  and  perhaps  a  few  gremlins  took 
whacks  at  the  sample  size.   Over  one-quarter  of 
the  students  who  were  identified  were  not  inter- 
viewed, because  their  school  was  more  than  50  miles 
distant.   Not  all  parents  gave  permission  for  the 
interview  of  their  students.   Not  all  students 
were  at  home.   Not  aM  of  the  completed  interviews 
could  be  matched  with  parent   interviews.   From  a 
potential  sample  of  443  students  identified  as 
usually  living  away  from  home,  the  result  is  a 
sample  of  only  167  matched  proxy-student  records. 
One  could  argue  that  the  failure  to  match  data  is 
the  most  fundamental  form  of  discrepancy  in 
parent-student  comparisons,  and  in  that  event  the 
true  sample  size  is  somewhat  larger,  but  even  with 
this  increase  in  sample  size  it  is  difficult  to 
generalize  from  the  results,  with  so  large  a  pro- 
portion of  the  sample  lost. 

The  results  seem  intuitively  right:  better  jobs 
come  to  the  attention  of  parents  more  than  low- 
paying  ones;  jobs  with  fewer  hours  also  attract 
their  attention  less.   So  the  proxy  data  are  more 
like  the  data  provided  by  the  students  for  the 
big-ticket  items.   I  wonder  whether  this  result  is 
more  general:  is  the  income  of  any  lower-earning 
member  of  a  household,  whether  a  student,  an  aged 
relative,  or  a  spouse,  less  well-reported  by  the 
principal  earner  in  the  household  than  is  the 
principal  income?   Is  there  a  general  tendency  to 
underestimate  or  otherwise  misreport  the  less 
crucial  items  in  a  family  budget? 

Kulka  makes  the  important  point  that  the  1979 
Research  Panel  of  the  ISDP  was  not  primarily 
designed  as  a  substantive  data  collection  instru- 
ment but  instead  as  a  flexible  vehicle  for  a  num- 
ber of  experiments  in  the  technical  and  operation- 
al problems  of  an  income  survey.   This  means  not 
only  that  the  ISDP  is  a  rich  data  resource  for 
methodological  research  but  also  that  substantive 
research  must  take  account  of  a  variety  of  design 
effects. 

Because  funding  for  ISDP  research  was  termina- 
ted in  1982 — just  when  the  data  sets  were  becoming 
available  for  research — a  number  of  the  experi- 
ments described  by  Kulka  have  been  underanalyzed. 
Kulka 's  paper  raises  more  questions  than  can  be 
answered,  as  a  consequence. 

The  ISDP  results  that  offer  some  confidence  in 
the  data  were  based  on  the  particular  design  adop- 
ted in  the  ISDP.   As  that  design  was  not  trans- 
planted to  the  SIPP,  we  must  not  read  Kulka 's 
paper  as  indicative  of  the  quality  of  data  to  be 
derived  from  the  SIPP.  The  SIPP  may  be  better, 
or  it  may  not.  The  same  sort  of  research  agenda 
planned  for  the  ISDP  is  needed  for  the  SIPP,  so 
that  we  can  have  the  confidence  in  the  SIPP  data 
that  we  can  now  have  in  the  ISDP. 
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CASE  STUDIES  IN  PANEL 

SURVEY  DESIGN: 

THE  INTERNATIONAL  EXPERIENCE 

SESSION  V 


This  section  is  comprised  of  four  papers  presented 
in  this  session  which  was  sponsored  by  the  Section 
on  Social  Statistics. 


THE  SURVEY  OF  INCOME  AND  PROGRAM  PARTICIPATION 
Roger  A.  Herriot  and  Daniel  Kasprzyk,  Bureau  of  the  Census 


Introduction 

In  October  1983,  the  Bureau  of  the  Census  con- 
ducted the  first  interviews  of  the  Survey  of 
Income  and  Program  Participation  (SIPP).  The 
SIPP  is  a  nationally  representative  household 
survey  intended  to  provide  information  on  all 
sources  of  cash  and  noncash  income,  eligibility 
and  participation  in  various  government  transfer 
programs,  disability,  labor  force  status,  assets 
and  liabilities,  pension  coverage,  taxes,  and 
many  other  items.  Data  from  the  survey  will 
provide  a  multiyear  perspective  on  changes  in 
income,  and  their  relationship  to  participation 
in  government  programs,  changes  in  household 
composition,  and  so  forth.  The  purpose  of  this 
paper  is  to  review  the  need  for  a  new  survey, 
briefly  describe  the  research  and  development 
work  leading  up  to  the  SIPP,  the  survey  design 
features,  procedures,  and  content  of  the  survey. 
Data  products  and  current  survey  and  research 
activities  will  also  be  described. 

The  Need  for  a  New  Survey 

The  development  of  SIPP  arose  in  response  to  a 
recognition  that  the  principal  source  of  informa- 
tion on  the  distribution  of  household  and  per- 
sonal income  in  the  United  States,  the  March 
Income  Supplement  of  the  Current  Population  Sur- 
vey (CPS),  had  limitations  which  could  only  be 
rectified  by  making  substantial  changes  in  the 
survey  instrument  and  procedures.  For  example, 
the  CPS: 

1)  does  not  measure  monthly  income  flows 
and  month-to-month  changes; 

2)  provides  annual  income  estimates,  where- 
as eligibility  for  most  Federal  programs 
is  based  on  a  monthly  accounting  period; 

3)  produces  estimates  of  last  year's  income 
based  on  current  household  membership; 

4)  does  not  measure  asset  holdings  and 
liabilities  and  does  not  provide  enough 
measures  of  categorical  information  to 
produce  sound  estimates  of  program 
eligibility;  and 

5)  underestimates  income  from  transfer  pro- 
grams, retirement  and  disability  income, 
unemployment  compensation,  and  property 
income. 

Because  of  these  limitations,  the  Income 
Survey  Development  Program  (ISDP)  began.  The 
purpose  of  the  ISDP,  authorized  in  1975,  was  to 
desiqn  and  prepare  for  a  major  new  survey,  the 
Survey  of  Income  and  Program  Participation 
(SIPP).  The  development  effort  was  directed  by 
the  Office  of  the  Assistant  Secretary  for  Plann- 
ing and  Evaluation  in  the  Department  of  Health 
and  Human  Services  and  was  carried  out  jointly 
with  the  Bureau  of  the  Census,  which  assisted  1n 
<ne  planning  and  carried  out  the  field  work,  and 
the  Social  Security  Administration  (SSA),  which 
administers  the  major  cash  Income  security  pro- 
grams. 

The  ISDP  developed  ntethods  Intended  to  over- 
come the  three  principal  shortcomings  of  the 
CPS:  1)  the  underreporting  of  property  income 
and  other  Irregular  sources  of  Income;  2)  the 


underreporting  and  misclassif1cat1on  of  partlc- 
pation  in  major  Income  security  programs  and 
other  types  of  information  that  people  generally 
find  difficult  to  report  accurately  (for  example, 
monthly  detail  on  income  earned  during  the  year); 
and  3)  the  lack  of  information  necessary  to 
analyze  program  participation  and  eligibility. 

The  principal  method  by  which  the  CPS  short- 
comings and  the  new  data  requirements  were  to 
be  addressed  was  the  use  of  a  longitudinal  survey 
design.  Persons  at  sample  addresses  were  inter- 
viewed about  their  income  and  other  character- 
istics for  the  previous  3  months.  They  were 
then  recontacted  at  regular  intervals,  having 
been  followed  to  new  addresses  if  necessary,  and 
asked  additional  questions  to  cover  the  Interven- 
ing period.  Any  other  persons  that  they  had 
moved  in  with,  or  vice  versa,  were  also  inter- 
viewed. This  continued  for  15  months,  and  ended 
with  a  set  of  questions  on  taxes.  In  this  way  a 
highly  detailed  record  was  built  up  for  each 
person  and  household  for  an  entire  calendar  year. 
This  design  minimized  the  need  for  sample  persons 
to  recall  the  details  of  Income  and  other  charac- 
teristics for  more  than  a  few  months  and  reduced 
the  number  of  questions  that  had  to  be  answered 
1n  each  interview. 

Because  less  time  was  required  to  update  the 
basic  information  after  the  first  interview, 
time  was  available  in  later  interview  waves  to 
ask  questions  about  other  topics  that  were  either 
stable  enough  not  to  require  periodic  updating — 
marital  history  and  pension  coverage,  for 
example--or  emerging  issues  of  onetime  interest, 
such  as  emergency  energy  assistance.  This  design 
enabled  a  set  of  core  questions  on  income  and 
other  eligibility  determinants  to  be  developed 
well  In  advance,  thereby,  ensuring  timely  proces- 
sing and  rapid  turnaround  while  leaving  Inter- 
view time  to  add  questions  on  new  policy  issues 
on  short  notice. 

Much  of  the  work  of  the  ISDP  centered  around 
four  experimental  field  tests  that  were  conducted 
to  examine  different  concepts,  procedures,  ques- 
tionnaires, recall  periods,  and  the  like.  Two 
of  the  tests  were  restricted  to  a  smal 1  number  of 
geographic  sites;  the  other  two  were  nationwide. 
In  the  first  nationwide  test,  the  1978  Research 
Panel,  approximately  2,000  households  were  Inter- 
viewed. Because  of  the  relatively  small  number 
of  interviews,  controlled  experimental  compari- 
sons of  alternatives  were  not  possible;  however, 
the  panel  did  demonstrate  the  feasibility  of 
many  new  ideas  and  methods.  It  also  laid  a 
foundation  for  the  largest  and  most  complex 
test,  the  1979  Research  Panel.  This  Danel  con- 
sisted of  a  nationally  representative  sample  of 
8,200  households  and  provided  a  vehicle  for 
feasibility  tests  and  controlled  experiments  of 
alternative  design  features.  Although  used 
primarily  for  methodological  purposes,  it  was 
sufficiently  large  to  provide  reliable  national 
estimates  of  many  characteristics  of  interest  to 
analysts.  (Public-use  microdata  files  and 
documentation  of  the  1979  Research  Panel  are 
available  through  the  National  Technical  Informa- 


169 


tlon  Service.  I/)  A  more  detailed  discussion 
of  the  ISDP  and  its  activities  are  provided  in 
Yeas  and  Lininger  (1981)  and  David  (1983). 
Because  the  ISDP  was  the  predecessor  to 
SIPP,  it  is  not  surprising  that  many  character- 
istics of  the  ISDP  are  reflected  in  the  SIPP 
design,  including  the  survey  design,  content, 
and  questionnaire  format. 

SIPP  Design  Features 

SIPP  started  in  October  1983  as  an  ongoing 
survey  program  of  the  Bureau  of  the  Census  with 
one  sample  panel  of  approximately  26,000  "design- 
ated" households  in  174  primary  sample  units 
(PSU's)  selected  to  represent  the  noninstlt- 
utional  population  of  the  United  States.  The 
actual  sample  size  was  somewhat  smaller  (about 
21,000  households)  because  some  of  the  selected 
households  were  unoccupied,  demolished,  converted 
for  nonresidential  use,  or  occupied  by  persons 
not  eligible  for  interview,  such  as  persons 
maintaining  a  usual  residence  elsewhere.  The 
sample  design  is  self-weighting;  that  is,  each 
unit  selected  in  the  sample  has  the  same  probab- 
ility of  selection. 

Each  household  is  interviewed  once  every  4 
months  for  2  1/2  years  to  produce  sufficient 
data  for  longitudinal  analyses  while  providing  a 
relatively  short  recall  period  for  reporting 
monthly  income.  The  reference  period  for  the 
principal  survey  items  1s  the  4  months  preceding 
the  interview.  For  example,  in  October,  the 
reference  period  is  June  through  September;  when 
the  household  is  interviewed  again  in  February, 
1 t  is  October  through  January.  This  interviewing 
plan  will  result  in  eight  interviews  per  house- 
hold. 

In  February  1985  and  then  every  January  there- 
after, a  new,  slightly  smaller  panel  will  be 
Introduced.  (Figure  1  illustrates  the  plan  for 
implementing  sample  panels  in  1984  through  1987.) 
This  design  will  allow  cross-sectional  estimates 
to  be  produced  from  a  combined  sample  of  approx- 
imately 35,000  households. 

Finally,  to  facilitate  field  operations,  each 
sample  panel  is  divided  into  four  approximately 
equal  subsamples,  called  rotation  groups;  one 
rotation  group  will  be  Interviewed  in  a  given 
month.  Thus,  one  cycle  or  "wave"  of  interviewing 
takes  4  consecutive  months.  Thjis  design  creates 
manageable  interviewing  and  processing  workloads 
each  month  instead  of  one  large  workload  every 
4  months;  however,  it  results  \f\   each  rotation 
group  using  a  different  reference  period.  Figure 
2  provides  an  illustration  of  the  relationship 
between  waves,  rotation  groups,  interview  months, 
and  reference  periods. 

To  recap,  the  panels  are  an  Important  feature 
of  the  SIPP  design— a  new  panel  1s  initiated  each 
year  There  are  also  waves  or  Interviews  1n  each 
panel.  Finally,  there  are  rotation  groups  within 
each  wave.  Four  rotation  groups  comprise  a  wave, 
each  consisting  of  approximately  one- fourth  of 
the  total  sample.  Since  the  reference  period  is 
the  4  months  prior  to  the  Interview,  one  should 
notice  that  each  rotation  group  has  a  different 
reference  period. 

The  collection  of  data  on  a  "staggered"  basis 
produces  7  months  of  data  under  a  4-month 


reference  period.  This  occurs  because  a  full 
sample  of  cases  during  the  wave  1s  not  available 
for  each  month  of  the  reference  period.  Looking 
aqain  at  fiqure  2,  January  Interviews  obtain 
data  for  the  period  September  through  December; 
February  interviews,  for  October  throuqh  January; 
March  interviews,  for  November  through  February; 
and  April  interviews,  for  December  throuqh  March. 
By  considering  the  use  of  individual  wave  files, 
the  September  data  will  only  be  available  for 
the  first  rotation  qroup;  the  October  data,  the 
first  two  rotation  groups;  the  November  data,  the 
first  three  rotation  aroups;  and  so  forth.  Be- 
cause of  the  design,  however,  matching  individual 
wave  data  together  will  allow  monthly  analysis 
on  the  full  sample.  Although  SIPP  is  essentially 
a  monthly  survey,  the  staggered  design  and  the 
consequently  staggered  reference  period  only 
permit  analysis  on  a  full  sample  for  one  month 
of  each  wave. 

SIPP  Collection  Procedures 

Data  collection  operations  are  managed  through 
the  Census  Bureau's  12  permanent  regional  offices. 
Interviewers  assigned  to  these  offices  conduct 
one  personal  visit  interview  with  each  sampled 
household  every  4  months.  At  the  time  of  the 
interviewer's  visit,  each  person  15  years  old  or 
older  who  is  present  is  asked  to  provide  inform- 
ation about  himself/herself;  a  proxy  respondent 
is  asked  to  provide  information  for  those  who 
are  not  available.  Telephone  Interviewing  is 
permitted  only  to  obtain  missing  information  or 
to  interview  persons  who  will  not  or  cannot 
participate  otherwise. 

The  average  length  of  the  interview  is  about 
30  minutes.  An  important  design  feature  of  SIPP 
1s  that  all  persons  in  a  sampled  household  at 
the  time  of  the  first  interview  remain  in  the 
sample  even  1f  they  move  to  a  new  address  during 
the  next  2  1/2  years.  For  cost  and  operational 
reasons,  person-visit  interviews  are  only  con- 
ducted at  new  addresses  that  are  in  or  within 
100  miles  of  a  SIPP  primary  sampling  unit. 
After  the  first  interview,  the  SIPP  sample  1s  a 
person-based  sample,  consisting  of  all  individ- 
uals who  were  living  in  the  sample  unit  at  the 
time  of  the  Wave  1  interview.  Individuals  aged 
15  and  over  who  subsequently  share  living 
quarters  with  the  original  sample  people  will 
also  be  interviewed  in  order  to  provide  the 
overall  economic  context  of  the  original  sample 
persons.  Changes  in  household  composition  caused 
by  persons  who  join  or  leave  the  household  after 
the  first  interview  are  also  recorded.  These 
individuals  are  interviewed  as  long  as  they 
reside  with  an  original  sample  person. 

Another  important  feature  of  SIPP  is  the 
identification  numbering  system.  Each  person 
will  be  assigned  a  unique  fourteen-digit  ident- 
ification (ID)  number  at  the  time  he/she  enters 
the  sample;  an  additional  two-digits  will  be 
assigned  if  the  person  moves  to  a  new  address. 
A  master  11st  of  identification  numbers  will  be 
used  by  the  regional  offices  to  monitor  the 
status  of  interviewing  each  month  after  Wave  1. 
The  regional  offices  will  be  responsible  for 
ensuring  that  there  is  a  completed  questionnaire 
(or  reason  for  noninterview)  for  each  number  on 
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or  household,  as  well  as  Income.  Some  of  the 
major  types  of  assets  collected  are  dividend 
earnings,  home  equity,  durable  qoods,  and  un- 
secured liabilities.  This  module,  unlike  the 
core  data  on  assets,  collects  the  value  of  the 
assets  such  as  the  market  value  of  real  estate, 
vehicles  owned,  and  stocks  held.  This  module 
will  be  administered  twice  1n  each  panel,  in 
waves  one  year  apart. 

Similarly,  module  of  questions  on  tax-related 
Information  will  be  fielded  twice  1n  each  panel. 
Information  on  filing  status  and  taxes  paid 
(Income,  property,  and  social  security  (FICA) 
taxes)  will  be  obtained  to  allow  the  estimation 
of  tax  Incidence,  disposable  income,  and  the 
simulation  of  tax  policy  alternatives.  In  addi- 
tion to  tax-related  matters,  an  annual  "round-up" 
module  will  also  be  administered  to  obtain  wage 
and  salary  data  from  W-2  forms  and  estimates  of 
annual  self-employment  and  property  income  for 
each  appropriate  person. 

Other  fixed  topical  modules  which  will  be 
administered  in  only  one  wave  of  the  survey 
include:  1)  marital  history,  2)  fertility,  and 
3)  migration. 

SIPP  Content:  Variable  Topical  Modules 

In  response  to  program  planning  and  pol 1cy 
analysis  data  requirements,  the  final  component 
of  the  SIPP  content  consists  of  modules  of 
questions  which  will  not  be  a  recurring  feature 
of  each  SIPP  panel. 

Brief  descriptions  of  several  of  these  vari- 
able topical  modules  will  Indicate  the  breadth 
of  information  collected.  For  example,  in  the 
third  Interview  a  module  of  questions  on  health 
care  utilization  and  planning,  and  social  ser- 
vices 1n-kind  (health  and  training)  was  asked. 
In  the  fourth  interview,  a  retirement  and  pension 
module  contained  questions  on  coverage  and 
vested  rights  in  retirement  and  pension  plans. 
The  data  will  help  in  the  analysis  of  how  net 
worth  is  related  to  retirement  decisions  and 
will  allow  a  comparison  of  the  Social  Security 
system  with  private  retirement  plans. 

Another  topical  module  in  the  fourth  Interview 
collected  characteristics  of  households  that 
affect  energy  usage.  These  data  will  help 
provide  a  better  estimate  of  Income  remaining 
after  all  housing  needs  are  met  and  fulfill  a 
need  for  information  concerning  energy  usage. 
They  will  also  provide  information  to  allow  the 
simulation  and  analysis  of  Individuals  and 
households  qualifying  for  energy  assistance 
programs. 

In  the  fifth  Interview,  a  child  care  topical 
module  has  been  developed  to  obtain  Information 
about  child  care  arrangements,  such  as  who 
provides  the  care,  the  number  of  hours  of  care 
per  week,  where  the  care  is  provided,  and  the 
cost  of  the  care.  These  data  will  be  useful 
because  child  care  expenses  are  a  major  part  of 
work-related  expenses  and  are  frequently 
deductible  for  program  eligibility  purposes. 

In  the  same  interview,  questions  on  welfare 
history  and  child  support  will  help  determine 
the  length  of  time  persons  receive  aid  from 
specific  welfare  programs,  as  well  as  provide 
Information  on  child  support  agreements.  The 


data  from  the  welfare  history  questions  will 
measure  the  extent  to  which  persons  and  house- 
holds have  been  dependent  upon  government  tran- 
sfer programs.  Questions  concerning  child 
support  will  measure  the  degree  to  which  the 
failure  of  the  father  to  provide  child  care 
affects  the  likelihood  of  the  mother  and  child- 
ren's participation  in  government  transfer 
programs. 

A  topical  module  on  reasons  for  not 
working/reservation  wage  will  include  questions 
to  ascertain  why  persons  are  not  1n  the  labor 
force.  The  data  collected  will  aid  the  under- 
standing of  the  conditions  required  for  un- 
employed persons  to  accept  a  job. 

Yet  another  topical  module  will  contain 
questions  about  providing  regular  payments  for 
the  support  of  persons  who  are  not  members  of 
the  SIPP  household  and  about  expense's  associated 
with  a  person's  job.  Those  questions  will  help 
in  obtaining  a  measure  of  the  fixed  financial 
obligations  of  persons,  resulting  in  a  more 
complete  picture  of  their  economic  situation. 

As  can  be  seen,  a  wide  variety  of  topics  are 
covered  under  the  aegis  of  the  variable  topical 
module  concept.  The  breadth  of  these  data  in 
combination  with  the  income  and  asset  information 
ensure  that  SIPP  will  be  a  widely  used  and  power- 
ful data  base  serving  many  purposes. 

SIPP  Data  Products 

A  number  of  publications  and  public-use  data 
files  will    be  generated  from  the  information 
collected  in  SIPP.     Both  publications  and  data 
files  are  identified  by  whether  they  are  cross- 
sectional    or  longitudinal.     Two  types  of  cross- 
sectional   reports  are  planned  by  the  Census 
Bureau:     1)   a  set  of  quarterly  and  annual    reports 
that  focussing  on  core  information;  and  2)   a  set 
of  periodic  or  single-time  reports  using  the 
detailed  data  from  the  topical  modules. 

The  quarterly  cross-sectional    reports  provide 
average  values  for  a  variety  of  labor  force, 
Income,  and  household  composition  measures  based 
on  monthly  averages.     The  first  quarterly  report 
was  issued  1n  fall    1984  and  contained  income  and 
labor  force  data  referring  to  the  third  quarter 
of  1983.     The  annual    reports  will   be  similar  in 
content,  but  will   show  values  averaged  across  12 
months  rather  than  3  months.     The  periodic  and 
single-time  reports  will   use  the  detailed  data 
from  the  topical   modules  to  examine  Issues  re- 
lated to  Income  and  program  participation.     These 
reports  may  also  focus  solely  on  the  material 
covered  1n  a  topical  module  such  as  work  history 
or  migration. 

Plans  for  longitudinal    data  reports  are  under 
discussion.     Six  kinds  of  reports  have  been 
proposed  for  consideration  (McMillen  and  Kasprzyk 
(1984)): 

1)  economic  profile  reports,  presenting 
yearly  aggregates  of  monthly  data  on 
individual s--reportin9  household  and 
family  information  as  characteristics  of 
Individuals; 

2)  comparative  profile  reports,  presenting 
comparisons  of  yearly  aggregates  of 
monthly  data  on  individuals; 

3)  transition  reports,  providing  changes  in 
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Income  and  program  participation  status 
between  two  points  in  time; 

4)  multiple  transition  reports,  providing 
patterns  of  labor  force,  Income,  and 
program  participation  activity,  and  the 
number  of  spells  in  a  given  state  and 
their  duration; 

5)  longitudinal   family  and  unrelated  in- 
dividual  reports,  presenting  the 
characteristics  of  lonqitudinal    family 
units  defined  in  SIPP  (see  McMillen  and 
Herrlot  (1984));  and 

6)  special   event  reports,  providing  data 
related  to  an  event,   such  as  marriage, 
divorce,  separation,  the  birth  of  a 
child,  a  return  to  school,  or  a  move  to 
a  new  address. 

SIPP  cross-sectional    data  files  will   be  Issued 
on  a  wave-by-wave  basis.     Each  file  will   Include 
person,' family,  and  household  information  col- 
lected 1n  the  survey  wave.     Almost  all   data 
obtained  on  the  questionnaire  will   be  Included 
on  the  files;  certain  summary  income  recodes  will 
also  be  included.     Data  that  might  disclose  the 
Identity  of  a  person  will   be  excluded  or  recoded 
1n  accordance  with  standard  Census  Bureau  con- 
fidentiality restrictions.     Wave  files  will   be 
edited,   imputed,  and  weiqhted  in  a  manner  con- 
sistent with  their  use  for  cross-sectional 
analysis.     A  unique  identification  number  will   be 
included  to  allow  users  to  merge  two  or  more  SIPP 
files.     However,  since  the  processinq  of  wave 
files  1s  independent,  wave-to-wave  data  incon- 
sistencies will   occur  and  the  user  must  be  pre- 
pared to  resolve  them. 

Plans  for  producing  public-use  files  designed 
for  longitudinal   analysis  are  less  well-defined 
at  this  time.     The  basic  longitudinal   file  for 
SIPP  is  a  calendaryear  file  of  core  income  data; 
that  is,  the  data  items  on  the  file  will  be 
monthly  observations  for  12  months  of  the 
calendar  year  for  each  person  in  the  sample. 
Thus,  the  first  of  the  series  would  be  a 
calendar-year  (CY)   1984  file  of  persons  from  the 
1984  Panel . 

SIPP  Unit  Nonresponse  Rates 

The  first  SIPP  Interviews  were  conducted  1n 
October  1983.     At  this  time,  cross  sectional 
unit  nonlntervlew  rates  are  available  for  the 
first  three  waves  of  SIPP.     Unit  nonlntervlew 
rates  (type  A  rate)  provide  a  measure  of  the 
success/failure  of  the  SIPP  field  work.     While 
refusals  are  the  largest  part  of  the  type  A 
rate,  1t  also  Includes  "no  one  home"  and  "tempor- 
arily absent"  households.     Figure  3  provides  an 
overall   look  at  these  rates.     In  Wave  Mall 
rotation  groups) ,  the  type  A  rate  was  4.87 
percent;  1n  Wave  2,  3.73  percent;   in  Wave  3, 
5.59  percent.     These  rates  are  an  improvement  on 
the  rates  experienced  1n  the  Income  Survey 
Development  Program. 

SIPP-Related  Research:     Discussion  of  SIPP 
Research  Issues 

Interruptions  1n  the  funding  support  for  SIPP 
1n  1982  resulted  in  a  cessation  of  the  analysis 
associated  with  the  pilot  surveys  of  the  develop- 
ment program.     These  surveys  provided  a  large 


body  of  data  to  address  a  number  of  important 
methodological   and  substantive  issues.     Many  of 
these  Issues  were  raised  by  Kasprzyk  (1983)  and 
discussed  further  in  such  forums  as  the  Office 
of  Management  and  Budget's  (0MB)   SIPP  Advisory 
Committee,  the  Social   Science  Research  Council's 
Subcommittee  on  the  SIPP,  the  staff  of  the 
Committee  on  National   Statistics,  and  the  Ameri- 
can Statistical   Association's  Census  Advisory 
Committee.     This  year's  meetings  of  the  American 
Statistical   Association  are  being  used  to  bring 
the  research  community  up-to-date  on  a  variety 
of  SIPP-related  research  Issues.     TodIcs,  both 
methodological   and  substantive,  are  covered  1n 
four  sessions  organized  under  the  auspices  of  the 
Social   Statistics  and  Survey  Research  Methods 
Sections. 

Finally,  a  SIPP  working  paper  series  has  been 
established  as  a  mechanism  to  provide  timely  and 
widespread  access  to  Information  developed  as 
part  of  the  SIPP.     Papers  in  the  series  will 
cover  a  broad  range  of  topics  including: 

1)  procedural  information  on  the  collection 
and  processing  of  data; 

2)  survey  methodology  research;  and 

3)  preliminary  substantive  results,  such  as 
the  measurement  of  household  composition 
change  over  time. 

A  substantial   effort  has  been  made  by  the  SIPP 
staff  to  exchange  ideas  related  to  SIPP  research 
with  the  research  community.     Several   areas  of 
research  are  described  below. 

SIPP  Research:     Accessing  SIPP  Data 

Processing  experience  with  data  collected  dur- 
ing the  development  program  has  shown  that  the 
complexity  of  the  data,  especially  its  longit- 
udinal  aspects,  results  in  severe  difficulties 
for  the  analysts.     Indeed,  the  structure  of  the 
SIPP  cross-sectional   data  files  will   seem  very 
complicated  to  most  users.     The  structure, 
described  by  F1nk  (1984),  consists  of  a  five- 
level  hierarchy— sample  unit,  address  or  house- 
hold, family,  person,  income  types — with  multiple 
record  types  1n  the  fifth  or  lowest  level.     This 
structure,  while  chosen  to  provide  maximum  flex- 
ibility for  cross-sectional   data  analysis  and 
to  simplify  the  merging  of  multiple  waves  of 
data  Into  a  longitudinal   data  base,  does  suggest 
that  alternative  ways  of  accessing  SIPP  data  are 
necessary. 

Acknowledging  the  data  access  difficulties, 
Census  Bureau  staff  has  been  working  with  an  0MB 
subcommittee  comprised  of  representatives  from 
various  Federal   agencies.     This  subcommittee  has 
proposed  the  development  of  an  "alternative"  SIPP 
cross-sectional   file.     The  proposed  file  has  a 
rectangular  structure  and  the  individual   is  the 
basic  unit  of  analysis.     At  this  time  it  seems 
that  the  content  of  the  rectangular  file  will   not 
differ  substantially  from  the  more  complicated 
file. 

An  Internal  Census  Bureau  committee  investi- 
gated the  need  for  and  use  of  a  data  base  manage- 
ment system  for  SIPP  within  the  Census  Bureau's 
current  processing  environment.  The  study  con- 
cluded that  Census  Bureau  options  1n  the  stat- 
istical data  base  field  were  quite  limited  and 
that,  of  the  data  base  management  systems  (DBMS) 


172 


the  11st  representing  all  the  persons  assigned 
for  Interview  In  a  month.  The  11st  will  be 
updated  regularly  to  account  for  persons  who  are 
added  or  deleted  from  the  sample. 

The  ID  is  Intended  to  provide  a  means  of  link- 
ing Information  about  an  individual  across  time 
and  uniquely  Identifying  which  household  each 
person  1s  a  member  of  at  any  point  in  the  panel. 
Through  the  ID  system,  we  expect  to  link  data 
from  all  persons  ever  associated  with  a  sample 
unit  throughout  the  2  1/2-year  duration  of  a 
panel.  This  will  facilitate  the  construction  of 
household  income  estimates  based  on  the  actual 
composition  of  households  during  the  measurement 
period.  More  information  about  the  construction 
and  use  of  the  ID  number  can  be  found  in  Nelson, 
McMlllen,  and  Kasprzyk  (1984)  and  Jean  and 
McArthur  (1984). 

SIPP  Content:  Control  Card 

The  control  card  is  used  to  obtain  and  main- 
tain information  on  the  basic  characteristics 
associated  with  households  and  persons  and  to 
record  information  for  operational  control  pur- 
poses. Characteristics  recorded  on  the  control 
card  by  the  interviewer  include  the  age,  race, 
ethnic  origin,  sex,  marital  status,  and  educa- 
tional level  of  each  member  of  the  household,  as 
well  as  information  on  the  housing  unit  and  the 
relationship  of  the  householder  to  other  members. 
A  household  respondent  provides  this  information, 
which  is  updated  at  each  interview.  The  control 
card  is  also  used  to  keep  track  of  when  and  why 
persons  enter  and  leave  the  household,  thereby 
providing  enough  information  to  automatically 
create  monthly  household  and  family  groups. 
There  is  also  space  to  record  information  that 
will  improve  our  ability  to  follow  persons  who 
move  during  the  survey.  In  addition,  after  each 
visit,  data  on  employment,  Income,  and  other 
Information  Is  transcribed  from  the  core  question- 
naire to  the  control  card  so  the  data  can  be  used 
in  the  next  Interview. 

SIPP  Content:  Core  Data 

The  content  of  SIPP  was  developed  around  a 
"core"  of  labor  force  and  income  questions 
designed  to  measure  the  economic  situation  of 
persons  in  the  United  States.  These  questions 
expand  the  data  currently  available  on  the  dis- 
tribution of  cash  and  noncash  Income  and  are 
repeated  at  each  interviewing  wave.  SIPP  core 
data  build  an  income  profile  of  each  person  aged 
15  and  over  1n  a  sample  household.  The  profile 
is  developed  by  determining  the  labor  force 
participation  status  of  each  person  in  the  sample 
and  asking  specific  questions  about  the  types  of 
Income  received,  Including  transfer  payments  and 
noncash  benefits  from  various  programs  for  each 
month  of  the  reference  period.  A  few  questions 
on  private  health  insurance  coverage  are  a1 co 
Included  In  the  core. 

Persons  employed  at  anytime  during  the  4-month 
reference  period  are  asked  to  report  on  jobs  held 
or  businesses  owned,  number  of  hours  and  weeks 
worked,  hourly  rate  of  pay,  amount  of  earnings 
received,  and  weeks  without  a  job  or  business. 

In  addition  to  questions  about  labor  force 
activity  and  the  earnings  from  a  job,  self- 


employment,  or  farm,  the  core  includes  questions 
related  to  nearly  50  other  types  of  Income. 
Questions  about  common  Income  types  are  specific- 
ally asked  while  the  receipt  of  less  common  in- 
come types  are  elicited  through  general  probing 
questions.  Questions  are  asked  about  the  receipt 
of  government  transfer  payments  from  retirement, 
disability,  unemployment  benefits,  and  welfare 
programs.  Information  on  the  receipt  of  noncash 
benefits  from  programs  such  as  Medicare  and 
Medicaid  is  also  obtained.  Other  income  ques- 
tions 1n  the  core  relate  to  private  transfers 
such  as  pensions  from  employers,  alimony,  and 
child  support.  For  certain  income  types,  such  as 
food  stamps  and  AFDC,  questions  are  included 
which  help  to  identify  the  household  members 
covered  by  the  payment,  thus  allowing  the  proper 
construction  of  program  analysis  units.  Finally, 
the  core  data  also  include  questions  on  the 
ownership  of  assets  which  produce  income,  such  as 
savings  accounts,  money  market  accounts,  NOW 
accounts,  stocks,  mutual  fund  shares,  and  rental 
property.  The  amounts  of  income  received  from 
these  income  producing  assets  are  also  obtained, 
as  well  as  indications  of  joint  holdings  and 
estimates  of  account  balances  if  the  amount  of 
Interest  1s  not  known. 

SIPP  Content:  Fixed  Topical  Modules 

The  core  data  provide  information  on  a  contin- 
uing basis  about  levels  of  economic  well-being 
and  changes  in  these  levels  over  time.  These 
data,  while  extremely  detailed,  allow  analyses 
of  well -being  which  only  account  for  income  and 
demographic  variables.  The  SIPP  has  been  design- 
ed to  provide  a  broader  context  for  analysis  by 
adding  questions  on  a  variety  of  topics  not 
covered  in  the  core  section.  These  questions 
are  labelled  "fixed  topical  modules"  and  are 
assigned  to  particular  interviewing  waves  of  the 
survey.   If  more  than  one  observation  is  needed, 
questions  on  one  wave  may  be  repeated  on  a  later 
wave. 

The  administration  of  these  modules  of  aues- 
tlons  1s  made  possible  by  the  fact  that  less  time 
1s  required  to  update  the  core  information  col- 
lected in  the  first  interview.  Also,  the  topics 
covered  in  these  modules  do  not  require  repeated 
measurement  at  each  interview  and,  therefore, 
may  use  a  reference  period  longer  than  the  period 
used  for  the  information  obtained  in  the  core. 
For  example,  the  third  SIPP  interview  question- 
naire collects  information  on  health  and  dis- 
ability, and  education  and  work  history.  The 
former  are  obtained  because  they  are  among  the 
major  factors  affecting  a  person's  ability  to 
work,  his/her  earnings,  sources  of  income,  and 
participation  1n  public  programs.  The  latter 
provide  data  to  help  understand  a  person's 
economic  situation  in  relationship  to  his 
educational  and  occupational  background.  The 
fourth  interview  contains  topical  data  on  ascets 
and  liabilities,  retirement  and  pension  coverage, 
and  housing  conditions/energy  usage.  The  col- 
lection of  assets  and  liabilities  data  allows 
the  study  of  economic  well-being  beyond  that 
which  can  be  observed  through  the  study  of  income 
alone.  Participation  1n  many  Federal  programs 
1s  contingent  upon  assets  held  by  the  individual 
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of  the  SIPP  to  correct  and  verify  a  respondent's 
SSN. 

Having  established  the  link  for  matching 
activities,  work  1s  now  proceeding  on  identifying 
content  and  availability  of  administrative  record 
systems  for  use  in:  a)  data  augmentation  for 
research  and  estimates;  and  b)  survey  data 
evaluation.  In  the  former,  a  working  group  1s 
developing  a  research  plan  and  methodology  pro- 
posal 1n  which  SIPP  demographic  data  would  be 
merged  with  economic  data  from  census  files  to 
form  a  microdata  base  for  both  individuals  and 
the  firms  1n  which  they  work.  Information 
concerning  this  study  can  be  found  1n  Haber, 
Ryscavage,  Sater,  and  Valdisera  (1984). 

Another  area  of  research  with  respect  to 
administrative  record  systems  1s  the  development 
of  validation  studies  of  Items  common  to  both  the 
survey  and  administrative  records.  The  goal  of 
the  project  is  the  improved  understanding  of  the 
quality  of  the  SIPP  data  and,  ultimately,  the 
development  of  quantitative  estimates  of  response 
and  nonresponse  errors  for  the  purposes  of  adjust- 
ing survey  data  or  modifying  survey  procedures 
to  obtain  better  quality  data.  The  first  aspect 
of  this  work  is  the  development  of  a  research 
plan  identifying  activities  required  to  evaluate 
SIPP  data  using  administrative  records.  The 
second  aspect  is  a  demonstration  and  feasibility 
study  to  examine  response  and  nonresponse 
errors  from  Waves  1  and  2  of  the  SIPP  using 
administrative  record  systems  for  the  major 
transfer  programs  from  a  limited  number  of 
states. 

SIPP  Research:  Panel  Surveys  as  a  Source  of 
Migration  Data 

SIPP  data  can  be  used  to  address  a  wide 
variety  of  migration  topics  in  two  ways:  1)  by 
traditional — cross- sectional — analyses,  these 
data  serve  to  further  understanding  of  how 
geographical  mobility  leads  to  adjustments  1n 
labor  markets,  housing  markets,  etc.;  and  2)  the 
survey's  longitudinal  design  provides  a  natural 
source  of  geographical  mobility  data  because 
Individuals  are  followed  should  they  move  to  a 
new  place  of  residence.  The  first  stage  of  this 
work  is  to  review  analyses  of  migration  from 
previous  panel  surveys  and  then  assess  how  SIPP 
can  further  our  understanding  of  geographical 
mobility  processes.  Some  results  from  this  work 
are  described  by  Dahmann  (1984). 

SIPP  Research:  Wave- to-Wave  Changes  in  Income 
and  Program  Receipt 

Analysts  with  an  Interest  in  the  quality  of 
the  data  obtained  1n  the  1979  ISDP  Panel— and 
the  subsequent  SIPP~have  expressed  concern  that 
respondent  reports  of  Income  receipt  may  be 
flawed.  There  appears  to  be  a  tendency  for  re- 
ported program  turnover  to  occur  between  waves 
more  often  than  within  waves—that  is,  1n  the 
pilot  surveys  of  the  development  program,  between 
months  3  4  4  rather  than  the  other  4  consecutive 
pairs  of  months.  It  1s  assumed  that  this  probab- 
ly represents  response  error  arising  from  Imperf- 
ect recall  although  other  factors  (for  example, 
mismatching  1n  linking  the  data  files)  might  also 
account  for  this  effect.  Some  analyses  of  this 


phenomenon  using  five  waves  of  data  from  the 
1979  Panel  and  examining  the  extent  to  which 
differences  are  related  to  respondent  status 
patterns  across  survey  waves  are  presented  in 
Moore  and  Kasprzyk  (1984).  Additional  work 
using  SIPP  data  1s  planned  as  soon  as  data  are 
available. 

SIPP  Research:  Longitudinal  Feedback  and 
Reconciliation  System 

Because  of  its  design,  SIPP  has  a  potential 
for  missing  and  inconsistent  data  problems  from 
wave  to  wave.  The  issue  of  concern  is  the 
development  of  appropriate  forms  and  procedures 
to  identify  and  correct  longitudinal  data 
problems  during  the  collection  rather  than  the 
processing  phase.  An  automated  income  and  work 
experience  profile  to  identify  potential  cross- 
wave  edit  failures  and  data  problems  could  help 
in  the  development  of  SIPP  longitudinal  data 
products.  This  profile  would  contain  responses 
on  labor  force  activity  and  amounts  of  income 
and  program  benefits  received  during  the  previous 
calendar  year.  It  would  be  reviewed  for  accuracy 
by  each  respondent  at  the  conclusion  of  the 
calendar  year.  A  system  such  as  this  should 
clarify  apparent  inconsistent  responses  by  using 
previously  reported  amounts  to  identify  and 
reconcile  cross-wave  inconsistencies. 

Prior  to  designing  and  Implementing  a  reconci- 
liation system  to  smooth  transition  data  during  a 
calendar  year,  a  preliminary  review  of  a  sub- 
sample  of  questionnaires  for  the  first  two  waves 
of  SIPP  will  be  conducted. 

SIPP  Research:  Sampling  for  Special  Populations 

After  the  SIPP  becomes  established  Federal 
program  agencies  may  be  Interested  in  adding 
sample  cases  for  specific  subpopulations  of 
Interest  to  policy  analyses,  such  as  the  high  and 
low  income  groups,  Blacks  and  Hispanics,  and  the 
aged  and  disabled.  A  multi-divisional  work 
group  is  discussing  methods  for  oversampling 
special  populations.  In  particular,  a  variety 
of  subsampling  (screening)  proposals  will  be 
analyzed. 

The  statistical  issue  under  consideration  1s 
the  reliability  of  estimates  when  different 
subsampling  schemes  are  Introduced.  During  the 
last  several  months,  subsampling  characteristics 
based  on  Income  and  demographic  variables  have 
been  identified,  and  estimates  of  reliability 
have  been  obtained  for  different  subsampling 
rates  and  different  characteristics.  A  summary 
of  results  is  now  1n  preparation. 

SIPP  Research:  Sampling  Error  Estimation 

Applications  of  SIPP  data  are  expected  for  a 
wide  variety  of  analyses— mlcrosimulatlon  model- 
ing, multivariate  analysis,  and  simple  tabul- 
ations and  cross-tabulations.  Some  topics 
under  study  1n  the  Statistic?!  Research  Division 
are:  1)  an  investigation  of  currently  available 
computer  software  which  provide  general  pro- 
cedures for  computing  sampling  error  estimates 
for  a  complex  survey  designs;  and 
2)  an  assessment  of  Census  Bureau  procedures  for 
computing  and  estimating  the  median  and  Its 
variance. 
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available  at  the  Census  Bureau,  the  Scientific 
Information  Retrieval   (SIR)   DBMS  seemed  to  be 
the  most  suitable  for  SI  PP.     As  a  result  of  this 
study,  SIPP  data  are  now  being  structured  1n  the 
SIR  data  base  format  by  the  Census  Bureau's 
Systems  Support  Division. 

Simultaneously,  along  with  the  SIR  data  base 
construction,  consideration  is  being  given  to 
the  development  of  a  software  system  to  accompany 
SIPP  data  files  and  allow  the  relatively  easy 
generation  of  extract  files  of  records  focusing 
on  changes,  duration,  and  spells  of  a  particular 
status.     These  extract  files  would  serve  as  an 
Intermediate  product  to  be  used  as  Input  into  the 
more  widely  available  statistical    software  pack- 
ages. 

Finally,  to   improve  the  understanding  of  and 
access  to  SIPP  data,  a  data  product  and  delivery 
program  is  being  developed.     This  program  would 
include  introductory  SIPP  workshops  and  user 
workshops,  newsletters,  and  guides  to  Increase 
the  understanding  of  SIPP  data  products. 

SIPP  Research:     Longitudinal   Concepts-Household 
and  Family  Definition 

Household  and  family  level    analysis  1n  a 
longitudinal   survey  is  complicated  by  the  fact 
that  the  composition  of  households  and  families 
can  change  over  time  since  original   sample  per- 
sons leave  to  join  other  households  or  families, 
or  to  set  up  new  ones.     The  principal    issue  is 
the  development  of  definitions  of  households 
and  families  which  account  for  survey  measure- 
ments at  two  or  more  points  in  time  and  which  do 
not  create  serious  conflicts  with  the  traditional 
cross-sectional   household  and  family  constructs. 
McMillen  and  Herriot  (1984)   discuss  this  problem 
in  detail   by  examining  cross-sectional   concepts 
and  their  deficiencies,  examining  and  evaluating 
several    longitudinal   concepts,  proposing  a  con- 
cept for  SIPP,  and  then  illustrating  how  the  con- 
cept would  be  used  in  calculating  aggregate  house- 
hold/family characteristics  and  in  tabulating  the 
number  of  households,  household  types,  and  their 
characteristics. 

SIPP  Research:     Longitudinal   Estimation 

Weighting  the  longitudinal   sample,  especially 
for  analytic  units  other  than  the  individual,  is 
a  difficult  area  requiring  a  continuing  statisti- 
cian-analyst dialogue.     Detailed  longitudinal 
weighting  procedures  will   not  be  developed  until 
closure  is  reached  on  definitions  for  longi- 
tudinal  analysis  units.     Despite  the  lack  of 
closure,  research  on  this  topic  has  proceeded 
along  two  dimensions — longitudinal   person  estim- 
ation and  longitudinal   household  (family  or 
recipient  unit)   estimation.     Since,  at  a  minimum, 
the  first  SIPP  longitudinal   mlcrodata  products 
will   be  files  of  person-based  Information  with 
household  (family  or  recipient  unit)   as  an 
attribute  of  the  person,  the  early  emphasis  of 
he  work   1s  or,  longitudinal   person  estimation, 
I'telimlnary  thoughts  on  thi s  topic  are  described 
by  Judklns,  Hubble,  Dorsch,  McMillen,  and  Ernst 
(1984). 

The  topic  of  longitudinal   household  (family  or 
recipient  unit)   estimation  1s  also  under  study. 
The  work  includes  the  development  of  an  explicit 


statement  of  the  longitudinal   estimation  problem, 
the  survey  universe  under  several   definitions  of 
longitudinal   household,  and  the  principal   estim- 
ation issues  which  require  study.     Preliminary 
thoughts  on  this  topic  are  discussed  1n  Ernst, 
Hubble,  and  Judklns   (1984). 

SIPP  Research:     Longitudinal   Imputation 

The  varieties  of  types  of  nonresponse—unit 
nonresponse  defined  as  nonresponse  to  all  waves 
of  the  survey,  wave  nonresponse  defined  as  non- 
response  to  a  particular  wave  interview,  and 
item  nonresponse  defined  as  nonresponse  to  a 
particular  item — and  their  patterns  pose  numerous 
difficulties  for  designing  appropriate  strategies 
for  nonresponse  compensation.     At  this  time 
several   aspects  of  the  problem  are  being  deve- 
loped.    The  Statistical    Research  Division  at  the 
Census  Bureau  1s  studying  patterns  of  Item 
"missingness";   frequency  of  state-to-state 
transitions;  and  evaluating  proposed  imputation 
strategies,  both  the  model -based  and  hot-deck 
types,  for  labor  force,  wages,  and  salary  items 
on  the  1979  ISDP  Panel.     Some  preliminary  ideas 
on  the  treatment  of  multiwave  item  nonresponse 
1n  the  SIPP  are  discussed  in  Samuhel   and  Huggins 
(1984). 

In  addition,  two  other  aspects  of  the  problem 
are  also  under  development: 

1)  To  study  item  nonresponse  for  property 
and  program  income — in  particular, 
examining  levels  and  patterns  of  item 
nonresponse  and  developing  and  comparing 
methods  to  treat  nonresponse  in  these 
subject  areas  using  more  than  one  wave  of 
data.     This  work  is  similar  to  the  pre- 
viously discussed  work  on  labor  force, 
wages,  and  salaries,  but  in  a  different 
subject-matter  area. 

2)  To  study  general    strategies  of  handling 
missing  data  in  panel   surveys,  including: 
(a)   a  discussion  and  analysis  of  issues 
pertaining  to  weighting  versus  imputing 
for  attrition  cases;   (b)   a  discussion  of 
the  treatment  (weighting  or  imputation) 
of  the  so-called  "non-nested"  missing 
data  cases;   (c)  an  empirical   examination 
and  comparison  of  weighting  and  imput- 
ation using  data  from  the  1979  ISDP 
Panel;  and  (d)   a  discussion  based  on 
empirical   results  of  when  to  choose  one 
strategy  over  the  other. 

SIPP  Research:     Use  of  Combined  Survey  and 
Administrative  Data' 

During  its  development  period,  SIPP  had  been 
viewed  as  an  integrated  data  system,  combining 
survey  data  with  administrative  record  data. 
Because  of  this  emphasis,  an  internal   Census 
Bureau  committee  was  formed  to  assess  and  make 
recommendations  regarding  the  potential   uses  of 
administrative  records,  the  development  of  demon- 
Ttratio:,  pilot  studies   and  the  special   con- 
fidentiality or  privacy  issues  involved  1n  the 
use  of  administrative  records.     The  committee 
developed  a  proposal,  later  implemented,  to 
electronically  validate  reported  social    security 
numbers  (SSN),  to  manually  search  for  SSN's  not 
reported  correctly,  and  to  use  the  panel   aspect 


SIPP  Research:  The  American  Statistical 
Association-Census  Bureau  Research 
Fellow  Program 

Recognizing  that  SIPP  research  cannot  be  con- 
fined solely  within  the  Census  Bureau,  SIPP 
research  planning  has  encouraged  the  use  of 
development  program  data  to  Increase  the  under- 
standing of  the  SIPP  data  and  the  Inherent 
difficulties  working  with  this  new  data  base. 
One  aspect  of  this  activity  has  been  the  expan- 
sion of  the  ASA-Census  research  fellow  program 
to  Identify  explicitly  SIPP-related  research 
activities.     As  a  result,  two  research  fellows 
have  been  chosen  for  the  1984-1985  academic 
year:     Harold  Watts  (Columbia  University)   and 
Constance  Citro  (National   Academy  of  Sciences 
and  Mathematica  Policy  Research).     Dr.  Watts 
Intends  to  use  the  1979  Panel   data  to  understand 
changes  in  living  arrangements  and  how  long  they 
last.     He  is  Interested  in  short  term  gross  flow 
patterns  of  change  in  household  status  and  will 
characterize  the  changes  1n  the  household  status 
of  the  individual   from  one  wave  to  another.     Dr. 
Citro  intends  to  simulate  alternative  definitions 
of  household  continuity.     She  will   develop  tables 
of  annual   household  and  family  Income  under 
alternative  definitions  using  the  data  from  the 
development  program's  1979  Panel   and  will   help 
Interpret  the  meaning  and  usefulness  of  the 
alternative  measures. 

Summary 

This  paper  has  reviewed  the  development  and 
current  activities  of  the  Survey  of  Income  and 
Program  Participation.  Survey  design  features, 
selected  field  procedures,  content  of  the  survey, 
and  data  products  have  been  described.  As  a  num- 
ber of  methodological  issues  remain  unresolved, 
selected  research  activities  have  been  described. 
While  covering  many  aspects  of  the  SIPP,  this 
paper  was  not  meant  to  be  comprehensive.  The 
scope  of  topics  discussed,  however,  illustrates 
the  nature,  breadth,  and  opportunities  of  the 
Survey  of  Income  and  Program  Participation. 


1/  To  request  tapes  and  documentation  describing 
the  history  of  the  1979  Research  Panel,  sample 
design,  survey  content,  estimation  procedures, 
and  data  collection  and  processing  procedures, 
write  to:  Department  of  Commerce,  National 
Technical  Information  Service,  5385  Port  Royal 
Road,  Springfield,  Virginia  22161  or  call 
(703)  487-4807. 
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Figure  2.  SIPP  Interview  Months  and  Reference  Periods 
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THE  GERMAN  SOCIO  -  ECONOMIC  PANEL 

Ute  Hanefeld,  Deutsches  Institut  fur  Wirtschaftsforschung,  Federal  Republic  of  Germany 


Introduction 

The  German  Socio  -  Economic  Panel  was  started  with 
its  first  wave  in  the  spring  of  1984.  It  will  provide 
representative  longitudinal  data  on  income,  transfer 
payments,  labor  market  experience,  changing  family 
composition,  and  housing  for  individuals,  families  and 
households.  In  addition,  it  will  allow  representative 
cross-sectional  analyses.  It  is  the  first  study  of  this 
kind  in  the  Federal  Republic  of  Germany. 

The  panel  is  designed  for  a  variety  of  analyses  which 
could  not  be  conducted  for  West  Germany  without 
this  data  base.  Primarily,  the  dynamics  of  individual 
behavior  and  personal  characteristics  are  of  interest. 
For  example,  income  changes  of  individuals,  the 
causes  of  these  changes  and  their  consequences  will 
be  analyzed.  Another  area  are  changes  in  labor 
market  participation,  particularly  for  women.  Pat- 
terns of  behavior  and  their  interdependencies  with 
the  living  conditions  of  the  household  and  the  behavi- 
or of  the  other  household  members  will  be  studied. 

The  sample  is  representative  of  the  entire  population 
of  West  Germany,  including  foreigners  and,  at  least 
partially,  institutionalized  persons.  All  household 
members  16  years  and  older  are  interviewed  annually 
in  6  000  households.  Each  initial  sample  member  will 
be  followed  within  the  area  of  the  Federal  Republic 
of  Germany  in  the  future.  At  present,  data  collection 
for  at  least  five  years  is  planned.  If  the  study  is 
successful,  it  is  intended  to  add  further  waves. 

The  study  is  conducted  by  the  Special  Collaborative 
Programme  3  'Microanalytical  Foundations  of  Social 
Policy'  (Sonderforschungsbereich  3  'Mikroanalytische 
Grundlagen  der  Gesellschaftspolitik')  at  the  Universi- 
ties of  Frankfurt  and  Mannheim,  in  collaboration  with 
the  'German  Institute  for  Economic  Research' 
(Deutsches  Institut  fur  Wirtschaftsforschung,  DIW)  in 
Berlin.  The  fieldwork  is  done  by  Infratest  Sozialfor- 
echung,  a  survey  institute  in  Miinchen.  The  project  is 
funded  by  the  German  Research  Society  (Deutsche 
Forschungsgemeinschaf  t,  DFG)  in  Bonn. 

Aims  of  the  study 

Up  to  now,  in  the  Federal  Republic  of  Germany  only 
a  few  panel  studies  have  been  conducted  in  social 
sciences.  These  studies  focus  on  individuals  and  In 
general  do  not  take  the  household  into  account.  In 
addition,  they  often  have  very  small  sample  sizes  and 
concentrate  on  special  groups  of  the  population  and 
on  special  topics.  Some  examples  are:  panels  that 
investigate  the  transition  of  youth  from  education  to 
the  labor  market  (1),  fertility  studies  (2)  and  one 
panel  between  1978  and  1982  concerning  unemploy- 
ment^). Panel  studies  concentrating  on  questions 
about  income,  transfer  payments  or  labor  market 
experience  do  not  exist. 

With  the  Socio  -  Economic  Panel,  a  broader  purpose 

is  intended: 

It    is   designed   to   be   representative  of  the 
entire  population  of  West  Germany. 


It  is  a  multi-purpose  survey  covering  a  broad 
range  of  aspects  concerning  living  condi- 
tions. 

Usually  surveys  for  the  Federal  Republic  of  Germany 
are  only  representative  of  the  German  population 
living  in  private  households  (4).  Because  of  language 
and  sample  selection  problems,  foreigners  are  not 
sampled,  a  group  to  which  7.4  %  of  the  population 
belonged  in  1983.  Further,  the  institutionalized  popu- 
lation (5)  is  not  included.  Even  the  German -Federal 
Statistical  Office  (Statistisches  Bundesamt)  does  not 
take  into  account  these  two  groups  in  most  surveys 
except  the  annual  microcensus  which  is  comparable 
to  the  Current  Population  Survey  of  the  U.S. 

Because  of  the  importance  of  guestworkers  in  many 
aspects  of  today's  economy,  this  group  has  been 
Included  in  the  Socio  -  Economic  Panel  and  oversBm- 
pled  to  allow  separate  analyses.  The  institutionalized 
population  with  1.5  million  persons  in  1970  (6),  is  not 
negligible  either.  In  the  official  definition,  it  consists 
to  a  large  part  of  elderly  persons  living  in  old  people's 
homes  and  of  employees  living  in  hostels.  People  who 
might  be  more  difficult  to  be  interviewed  because 
they  are  ill,  living  in  prisons  or  barracks,  only  amount 
for  less  than  half  of  the  whole  group.  Additionally, 
the  exclusion  of  the  institutionalized  population  from 
the  panel  would  lead  to  substantial  panel  mortality  if 
persons  were  not  followed  when  they  move  from  a 
household  to  an  institution.  Therefore,  an  attempt 
has  been  made  to  include  this  population  in  the  panel, 
too. 

The  study  is  designed  to  provide  longitudinal  data  for 
individuals  as  well  as  for  families  (7)  and  households. 
Many  problems  refer  to  individuals,  e.g.  labor  market 
experiences  or  earnings.  However,  it  is  also  impor- 
tant to  know  the  family  and  household  background  of 
the  person.  For  other  topics,  like  poverty  or  housing, 
information  on  the  household  level  is  needed.  Analy- 
ses of  transfer  programs  require  data  for  families  and 
households. 

Cross-sectional  analyses  have  shown  that  during  the 
last  years  the  size  of  households  decreased  and  the 
number  of  households  increased.  But  to  a  large  extent 
it  is  unknown  how  this  process  works  and  which  are 
the  causes.  With  the  Socio  -  Economic  Panel  it  will 
be  possible  to  investigate  changes  in  the  household 
composition  and  their  causes  and  circumstances.  At 
the  same  time,  the  influence  of  changing  household 
composition  on  living  conditions  of  the  household 
members  is  of  great  interest.  For  example,  one  might 
ask  for  the  impact  of  divorces  or  death  on  family 
income  or  on  the  labor  market  participation  of  the 
household  members. 

For  the  Federal  Republik  of  Germany  little  is  known 
about  income  changes  on  the  level  of  households  and 
individuals.  It  is  planned  to  analyze  the  frequency  and 
extent  of  such  income  changes,  their  causes  and  how 
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people  cope  with  such  situations.  Investigations  of 
the  poverty  population  will  be  of  special  interest. 
Which  factors  are  responsible  for  the  entry  into  low 
income  groups?  How  much  fluctuation  does  exist 
within  the  poverty  population?  Who  is  able  to  leave 
the  poverty  status  and  by  which  means?  In  connection 
with  such  questions,  it  will  be  possible  to  analyze 
transfer  program  participation,  and  whether  differ- 
ent transfer  programs  reach  their  target  groups  and 
their  goals. 

During  the  last  ten  years  unemployment  became  a 
severe  problem  in  the  Federal  Republic  of  Germany. 
Changing  demographic  structures  increase  the  diffi- 
culties on  the  labor  market  because  at  the  moment 
large  cohorts  enter  the  labor  force.  This  situation 
intensifies  the  necessity  of  longitudinal  analyses, 
especially  of  occupational  careers  and  mobility. 

Concerning  labor  supply,  information  about  the  pat- 
terns of  behavior  of  women  and  youth  are  important. 
With  the  panel  study,  it  will  be  possible  to  analyze 
female  iBbor  force  participation,  interrupts  in  labor 
supply,  for  how  long,  and  what  the  possibilities  and 
circumstances  are  to  reenter  the  labor  market. 

As  mentioned  above,  there  exist  already  some  panel 
studies  in  the  Federal  Republic  of  Germany  concen- 
trating on  the  transition  of  youth  from  education  to 
the  labor  market.  Because  these  studies  observe  more 
cases  of  the  particular  subgroups  than  the  Socio  - 
Economic  Panel,  analyses  based  on  these  data  can  be 
more  detailed.  But  as  these  studies  concentrate  on 
persons,  and  because  some  of  them  only  follow  selec- 
ted cohorts,  it  is  the  task  of  the  Socio  -Economic 
Panel  to  supplement  household  related  information 
and  to  check  whether  the  results  concerning  special 
cohorts  can  be  generalized. 

Another  important  area  of  the  Socio  -  Economic 
Panel  will  be  the  analysis  of  job  mobility.  Until  now, 
only  little  is  known  about  who  is  changing  jobs  and 
for  what  reasons,  how  people  find  a  new  job,  what 
career  patterns  do  exist  and  what  are  the  conse- 
quences of  job  changes  for  other  living  conditions. 

The  fourth  main  area  of  the  panel  concerns  housing 
and  residential  mobility.  Although,  since  the  Second 
World  War,  government  has  supported  the  housing 
market  with  diverse  programs,  housing  is  still  an 
important  theme  in  political  debates.  Cross-sectional 
analyses  have  shown  that  there  is  up  to  now  a 
remarkable  unbalance  in  the  distribution  of  quality 
and  quantity  of  housing.  Growing  aspirations  and  an 
increasing  number  of  households  are  discussed  as 
causes  for  this  situation. 

With  the  Socio  -  Economic  Panel  it  will  be  possible  to 
study  the  development  on  the  housing  market  and  the 
efficiency  of  transfer  programs  related  to  housing. 
For  example,  it  is  planned  to  analyze  which  house- 
holds move,  what  the  causes  are  for  residential 
mobility,  who  is  changing  from  the  renter  status  to 
owner  status  and  vice  versa. 

Beside  the  main  topics  of  the  study,  changing  family 
composition,  income,  transfer  payments,  labor  mar- 
ket experience  and  housing,  there  are  questions 
about  education,  health  and  attitudes. 

It  is  not  possible  to  mention  all  the  planned  analyses 


here.  But  as  can  be  seen  by  the  broad  range  of  topics, 
the  Socio  -  Economic  Panel  is  designed  as  a  multi- 
purpose survey.  The  main  interest  of  the  study  is  to 
investigate  not  only  the  changes  within  the  particular 
areas,  but  also  the  interdependencies  among  them. 
On  the  other  hand  this  implies,  that  analyses  in  a 
single  area  can  not  be  as  detailed  as  in  special  studies 
conducted  for  that  purpose.  As  an  example,  informa- 
tion about  unemployment  will  not  be  as  detailed  as  in 
a  specialized  panel,  but  the  Socio  -  Economic  Panel 
will  show  such  relationships  as  the  consequences  of 
unemployment  for  the  labor  market  participation  of 
other  household  members,  for  family  income  or  for 
residential  mobility. 

The  sample 

As  explained  above,  the  population  of  the  Socio  - 
Economic  Panel  is  defined  as  the  entire  population  of 
the  Federal  Republic  of  Germany  including  guest- 
workers  and  institutionalized  persons.  However,  gen- 
eral problems  arose  with  regard  to  the  sample  selec- 
tion. Initially,  it  was  intended  to  built  up  a  new 
sample  frame  based  on  the  census  which  was  planned 
for  1983  (8).  Unfortunately,  this  census  was  cancelled 
because  of  the  growing  discussion  about  privacy  is- 
sues. In  summary,  since  other  approaches  proved  to 
be.infeasible,  there  was  no  sample  frame  available 
covering  the  entire  population.  Even  the  existing 
sample  frame  for  the  microcensus  could  not  be  used 
because  the  samples  were  not  ready,  and  the.  work 
that  still  needed  to  be  done  could  not  be  completed 
because  of  the  political  controversy  over  invasion  of 
privacy.  Besides,  this  sample  frame  probably  is  biased 
since  it  is  still  based  on  the  1970  census. 

Therefore,  two  subsamples  were  constructed  and  for 
each  a  special  sample  was  drawn: 

Sample  A:  Persons  in  households  with  a  head 

who  is  not  Spanish,  Italian,  Greek, 
Turkish  or  Jugoslavian. 

Sample  B:  Persons      in      households     with     a 

Spanish,  Italian,  Greek,  Turkish  or 
Jugoslavian  head. 

For  sample  A,  an  existing  sample  frame  used  by  the 
German  6urvey  institutes  (9)  was  modified.  It  is  an 
area  sample  based  on  voting  districts  of  the  last 
election.  Because  guestworkers  are  not  allowed  to 
vote,  they  are  not  represented  in  this  frame.  For  the 
4,500  households  of  sample  A,  584  sample  points  were 
selected  systematically  with  a  random  start.  In  each 
sample  point  the  interviewer  got  a  starting  address. 
Starting  from  this  address,  the  interviewer  had  to 
note  the  following  84  addresses.  He  or  she  then  had 
to  recruit  every  seventh  household  for  the  panel,  but 
the  last  two  addresses  were  only  used  if  not  at  least 
eight  interviews  were  obtained  from  the  first  ten 
households. 

Sample  B  includes  the  main  groups  of  guestworkers. 
It  has  a  sample  size  of  1,400  households:  400  Turkish, 
300  Italian,  300  Jugoslavian,  200  Greek  and  200 
Spanish  households.  For  this  population  there  are 
regional  administrative  registers.  An  area  sample  was 
drawn  separately  for  each  nationality.  The  sample 
points  again  were  selected,  by  systematic  choice  with 
a  random  start.  Altogether,  240  sample  points  were 
chosen  and  the  administrations  of  the  corresponding 
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counties  were  asked  to  select  20  addresses  at  ran- 
dom. At  least  7  addresses  per  sample  point  were 
passed  to  the  interviewers  to  be  recruited  as  panel 
households. 

For  the  pretest  -  done  in  autumn  1983  with  200 
households  -  a  separate  sample  for  the  institutional- 
ized population  was  tested  How*'"*,  since  no  sample 
frame  is  available  and  data  ere  insufficient,  for  the 
moment  it  is  only  possible  to  construct  a  quota 
sample.  The  results  of  the  pretest  were  ambivalent. 
There  were  no  special  problems  getting  an  interview 
from  people  living  in  old  people's  homes,  hostels, 
dormitories  etc.  But  big  problems  arose  when  inter- 
viewing people  in  barracks,  sanatoria  etc.  There,  in 
most  cases,  the  interviewer  was  not  even  able  to 
obtain  permission  to  select  a  respondent  and  contact 
him. 

It  might  be  possible  to  draw  a  sample  for  the  whole 
institutionalized  population,  if  separate  questionn- 
aires -  which  take  into  account  the  special  problems 
of  this  group  -  were  designed  and  if  special  efforts 
were  made  to  get  the  permission  of  the  administra- 
tion of  the  institutions.  Since  the  subsample  would 
have  had  a  size  of  only  180  persons  in  the  main 
waves,  the  necessary  additional  costs  and  work  did 
not  appear  to  be  justified  in  view  of  the  expected 
result.  Therefore,  the  decision  was  made  to  include 
that  part  of  the  institutionalized  population,  where 
there  were  no  extraordinary  difficulties,  in  sample  A 
or  sample  B,  respectively.  For  the  remaining  part,  it 
was  decided  that  each  sample  member  who  moves 
into  such  an  institution  will  also  be  followed.  With 
this  concept,  it  is  hoped  to  cover  the  entire  institu- 
tionalized population  in  the  course  of  the  panel. 

Interviewing  procedure 

In  many  studies,  only  the  head  of  the  household  is 
interviewed,  but  as  the  Panel  Study  of  Income  Dy- 
namics has  shown,  the  head  changes  quite  frequently 
(10).  For  a  panel,  this  would  mean  that  different 
respondents  answer  the  questionnaire  in  different 
waves.  In  addition,  there  is  the  risk  that  the  head  is 
not  well  enough  informed  about  characteristics  of 
other  household  members.  Besides,  information  on 
attitudes  is  required.  But,  it  is  not  feasible  to  ask  the 
head  of  the  household  about  attitudes  of  other  house- 
hold members  ,  since  subjective  variables  can  only  be 
measured  appropriately  by  self-interviews.  If  only 
heads  of  households  were  interviewed,  information 
would  be  biased.  On  the  other  hand,  if  the  respondent 
were  selected  randomly  in  the  household,  the  risk  is 
even  higher  that  he  or  she  is  not  enough  informed 
about  household  related  questions.  Therefore,  it  was 
decided  to  interview  all  household  members  16  years 
and  older  in  the  Socio  -Economic  Panel. 

Standardized  fixed  format  questionnaires  are  used  for 
the  data  collection.  There  are  bilingual  question- 
naires for  the  foreign  subsample.  Each  of  the  five 
nationalities  gets  a  questionnaire  in  their  mother 
tongue  and  under  each  question  the  German  version  is 
also  printed.  This  form  was  chosen  in  case  there  were 
different  nationalities  in  a  household  end  to  assist  the 
German  interviewer  in  keeping  control  over  the  in- 
terviewing process. 

As  none  of  the  survey  institutes  has  a  permanent 
foreign  language  fieldstaff  and  because  it  is  expen- 


sive and  risky  to  build  a  new  staff  which  would  be 
lacking  in  interview  experience,  an  unusual  form  of 
interviewing  was  tried  in  the  pretest.  Each  inter- 
viewer in  the  foreign  subsample  had  to  find  a  foreign- 
er of  that  nationality  to  accompany  him  or  her.  This 
person  was  trained  very  briefly  by  the  German  Inter- 
viewer to  help  in  making  the  first  contact  and,  if 
necessary,  in  doing  the  interview.  The  pretest  showed 
that,  in  general,  this  concept  works  well.  However,  in 
cases  when  the  respondents  knew  German  well,  these 
accompanying  persons  sometimes  were  disturbing. 
Therefore,  the  rule  was  changed,  so  that  the  inter- 
viewer could  use  such  a  person,  but  is  not  required  to 
do  so. 

To  train  the  interviewers  for  the  first  wave,  each  got 
an  interviewing  package  that  included  a  test  with 
several  questions  particularity  concerning  definitions 
of  the  population  and  the  use  of  the  coversheet.  For 
the  second  wave,  this  test  will  focus  mainly  on 
questions  dealing  with  how  to  follow  respondents  and 
whom  to  interview.  In  addition,  the  interviewers  had 
to  conduct  one  test  interview  of  a  sample  household. 
The  test  and  the  test  interview  were  controlled  by 
the  survey  institute,  and  if  necessary,  the  interviewer 
was  retrained  before  he  or  she  was  allowed  to  do 
further  interviews.  This  concept  worked  very  well, 
although  it  should  be  mentioned  that  many  inter- 
viewers did  not  pass  this  training,  either  because  they 
did  not  want  a  work  load  considerably  above  that  of 
usual  surveys,  or  they  made  too  many  mistakes.  This 
procedure  resulted  in  168  interviewers  being  ex- 
changed who  did  not  qualify.  Finally,  a  total  of  622 
interviewers  was  engaged  in  the  fieldwork. 

The  questionnaires 

Three  different  questionnaires  were  used  for  the  first 
wave  of  the  Socio  -  Economic  Panel.  The  coversheet 
is  the  main  instrument  to  control  the  panel  systemat- 
ically. The  interviewer  has  to  fill  it  out  for  each 
sampled  household.  In  the  first  wave,  the  name  of 
the  household  had  to  be  noted,  and  additionally  for 
each  address  whether  the  contact  was  successful  or 
the  cause  of  a  drop-out  or  a  refusal.  In  a  second  part 
of  the  coversheet,  the  interviewer  had  to  note  all 
household  members  16  years  and  older  by  name, 
gender,  year  of  birth  and  relation  to  the  head  of  the 
household.  If  a  household  member  droped  out  or 
refused,  the  interviewer  also  had  to  note  the  special 
reason.  In  addition,  there  were  some  questions  about 
the  housing  environment  which  provide  supplemen- 
tary information  for  the  analyses  of  refusals.  With 
this  coversheet  each  sample  member  got  an  identifi- 
cation number. 

Beginning  with  the  second  wave,  each  coversheet  will 
be  printed  with  the  household  address,  and  the  identi- 
fication number,  name,  gender  and  year  of  birth  of 
all  household  members  as  of  the  last  data  collection. 
If  the  whole  household  has  moved,  the  interviewer 
has  to  note  the  new  address.  He/she  has  to  indicate 
whether  new  members  belong  to  the  household  and,  if 
this  is  the  case,  their  main  characteristics.  If  house- 
hold members  leave  the  household,  the  interviewer 
has  to  find  out  their  new  address  and  send  it  to  the 
institute.  For  households  that  moved  or  split,  the 
institute  fills  out  a  special  coversheet  and  gives  it 
back  to  an  interviewer  living  in  that  area. 

The  identification  number,  the  first  name,  gender  and 
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year  of  birth  are  noted  on  the  other  questionnaires  as 
well  and  used,  in  combination  with  the  information  of 
the  coversheet,  to  ensure  that  the  respondents  re- 
main the  same  from  one  wave  to  the  next. 

The  second  questionnaire,  called  the  household  inter- 
view, is  administered  in  an  oral  interview  to  the 
household  head  or  a  household  member  who  is  well 
informed  about  the  whole  household.  It  takes  about 
15  minutes.  This  questionnaire  contains  questions 
which  do  not  need  to  be  directed  to  every  household 
member,  either  because  the  same  information  would 
be  collected  several  times,  such  as  information  about 
the  children,  housing  and  household-related  transfer 
payments,  or  because  there  is,  for  some  variables, 
the  risk  that  it  is  not  clear  whether  the  answer  is 
household  or  person-related.  An  example  of  the  latter 
are  questions  about  property. 

The  third  questionnaire  is  the  personal  interview. 
Because  in  large  households  it  might  be  a  heavy 
burden  to  conduct  personal  interviews  with  every 
household  member  16  years  and  older,  which  might 
reduce  the  response  rates,  in  sample  A  it  is  up  to  the 
interviewer  to  decide  when  to  give  an  oral  interview 
and  when  to  have  respondents  fill  out  their  own 
questionnaires.  But,  with  some  exceptions,  the  inter- 
viewer is  to  be  present  when  the  respondents  answer 
the  questionnaire,  to  motivate  and  help  them  if 
necessary.  Because  there  are  some  foreigners  who 
have  difficulties  in  writing  and  reading,  in  Sample  B 
the  interviewers  are  instructed  to  conduct  oral  inter- 
views. 

The  personal  questionnaire  takes  about  30  minutes  to 
complete.  It  contains  questions  about  earnings,  per- 
son-related transfer  payments,  labor  market  experi- 
ence, education,  health,  time  use,  attitudes  and  de- 
mography. In  the  first  two  waves  there  are  also 
biographical  questions,  especially  retrospective  ques- 
tions about  labor  market  participation  and  marital 
status.  In  each  wave  the  questions  on  income,  trans- 
fer payments  and  labor  market  participation  cover 
the  previous  calendar  year  to  record  all  changes.  A 
detailed  description  is  only  collected  for  the  current 
job  at  the  time  of  the  interview;  this  means  that 
there  is  no  information  on  any  other  jobs  between 
two  waves. 

Starting  with  the  second  wave,  there  is  a  supplemen- 
tary questionnaire  to  the  personal  interview.  It  is 
addressed  to  new  sample  members,  being  interviewed 
for  the  first  time,  persons  who  either  were  15  years 
during  the  last  wave,  or  persons  who  live  together 
with  initial  sample  members.  The  suppementary  ques- 
tionnaire contains  the  most  important  questions 
which  were  asked  in  previous  waves  and  not  repeated 
in  the  current  wave.  It  includes,  for  example,  the 
questions  on  the  level  of  schooling  and  vocational 
training  and  the  retrospective  biographical  questions. 

Methods  to  maintain  the  panel 

Although  in  every  wave  respondents  of  the  Socio  - 
Economic  Panel  are  household  members  16  years  and 
older,  the  sample  will  be  representative  of  the  entire 
population  because  information  about  children  is 
gathered  in  the  household  interview  and  children  are 
sample  members,  however  not  interviewed  personal- 
ly. Further,  children  born  or  adopted  by  initial  sample 
members  will  be  included  into  the  sample,  and  chil- 


dren reaching  the  age  of  16  years  will  become 
respondents.  This  concept  prevents  the  panel  popula- 
tion from  getting  older  on  the  average.  Drop-outs  by 
death  are  replaced  by  the  following  generation. 

In  addition,  all  sample  members  will  be  followed  and 
reinterviewed  within  the  area  of  the  Federal  Repub- 
lic of  Germany  regardless  of  whether  they  move  with 
the  whole  household  or  whether  they  split  off.  This 
concept  allows  observation  of  the  process  of  splitting 
off  and  setting  up  of  new  households.  Non-sample 
members  who  live  together  with  sample  members  in 
later  waves  will  be  recruited  as  respondents  if  they 
are  16  years  and  older,  but  if  these  households  split 
again,  only  the  sample  members,  not  the  non-sample 
members,  will  be  followed. 

Special  problems  might  arise  for  the  sample  of  for- 
eigners. At  the  moment,  the  government  makes  an 
effort  to  motivate  the  guestworkers  to  move  bBCk  to 
their  native  countries.  On  the  other  hand,  a  lot  of 
them  still  bring  their  families  to  Germany  apparently 
expecting  to  stay.  Which  development  will  dominate 
is  unknown.  Movement  out  of  Germany  will  be  shown 
by  the  panel.  To  cover  what  is  probably  the  most 
important  part  of  the  immigration,  there  is  the  rule 
that  persons  moving  from  foreign  countries  directly 
Into  a  sample  household  will  get  the  status  of  a 
sample  member.  This  means  that  they  will  be 
followed,  if  they  split  from  the  sample  household. 
Persons  who  come  from  foreign  countries  and  form 
their  own  households  or  move  into  the  institutional- 
ized population  are  not  represented  in  the  panel;  it 
may  be  necessary  to  include  a  sample  of  them  from 
time  to  time  in  the  panel.  Altogether  these  rules  will 
make  it  possible  for  the  Socio  -Economic  Panel  to 
reflect  the  natural  development  of  the  population, 
except  for  the  small  group  of  immigrants  mentioned 
last. 

Refusals  are  one  of  the  most  serious  hazards  for  a 
panel.  If  there  are  too  many,  the  panel  will  be 
destroyed  in  a  short  time.  In  the  Federal  Republic  of 
Germany,  there  are  no  experiences  with  response 
rates  which  can  be  directly  compared.  In  existing 
German  panels,  response  rates  have  been  far  smaller 
than  for  instance  for  the  Panel  Study  of  Income 
Dynamics  (11)  or  the  National  Longitudinal  Surveys 
(12)  in  the  U.S.  But  in  most  German  studies,  very 
little  was  done  to  increase  the  response  rates.  A 
comparison  clearly  shows  that  the  more  was  invested 
into  the  motivation  of  the  respondents  and  into  the 
fieldwork,  the  better  were  the  results.  Therefore,  for 
the  Socio  -  Economic  Panel  special  efforts  are  made 
to  reduce  refusals. 

An  intensive  interviewer  training  is  important  for  a 
successfull  data  collection.  The  procedure  has 
already  been  explained  above.  To  help  the 
interviewers  make  the  first  contact,  they  presented  a 
small  booklet  to  the  respondents  which  described  the 
purpose  of  the  study,  how  the  household  had  been 
selected,  why  it  is  necessary  to  have  its  participation 
and  how  data  protection  works.  Finally,  it  announced 
that  all  participants  will  receive  a  ticket  for  a  well 
known  lottery. 

In  general,  the  interviewers  have  to  try  to  contact 
the  households  personally,  making  as  many  visits  as 
necessary  to  reach  somebody.  If  they  get  a  refusal  or 
can  not  secure  a  personal  contact  because  the  re- 
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spondents  do  not  open  the  door,  a  letter  from  the 
institute  including  the  booklet  is  sent  to  these  house- 
holds, unless  they  had  refused  for  very  serious  rea- 
sons such  as  data  protection.  Then  another  inter- 
viewer tries  to  contact  the  household.  If  this  attempt 
is  unsuccessful  too,  a  specially  trained  group  of 
telephone  interviewers  will  try  to  contact  the  house- 
hold to  get  a  date  for  an  interview.  About  five 
months  were  calculated  for  the  fieldwork  of  each 
wave,  to  allow  enough  time  for  the  motivation  of 
reluctant  respondents  and  to  wait  for  ill  persons  and 
persons  on  holiday. 

After  each  data  collection,  the  respondents  will  get  a 
letter  with  a  little  present.  After  the  first  wave  they 
got  a  ticket  for  the  lottery.  Some  weeks  before  the 
next  data  collection  starts,  they  also  will  receive  a 
short  report  of  the  study  with  the  announcement  of 
the  next  wave.  Beside  the  motivation,  these  two 
additional  contacts  function  to  up-date  the  address 
file  (13). 

Interim  findings  of  the  fieldwork  for  the  first  wave 

The  fieldwork  for  the  first  wave  of  the  Socio  - 
Economic  Panel  will  be  terminated  in  early  October 
1984.  Therefore,  at  the  moment,  it  is  only  possible  to 
report  some  interim  findings  and  to  give  an  estimate 
concerning  the  expected  response  rates. 

The  fieldwork  started  with  the  interviewer  training 
and  the  test  interviews  in  February  1984,  and  with 
the  main  part  of  the  data  collection  in  March  1984. 
However,  difficulties  arose  and  led  to  a  delay  of  the 
time  table. 

As  mentioned  above,  a  lot  of  interviewers  did  not 
qualify  during  the  interviewer  training.  In  several 
sample  points  this  required  to  exchange  interviewers 
and  to  start  the  training  procedure  again. 

For  sample  B  of  foreigners,  there  were  difficulties  to 
receive  for  all  selected  sample  points  the  addresses 
from  the  respective  county  administrations.  In  some 
cases,  intensive  discussions  about  data  protection 
were  necessary.  For  some  sample  points  the  addres- 
ses did  not  arrive  in  time,  and,  therefore,  the  field- 
work  in  these  areas  was  delayed  for  several  weeks. 

The  sample,  as  described  above,  was  designed  assum- 
ing an  amount  of  unusable  addresses  of  about  6  %  and 
a  response  rate  of  about  70  %.  The  fieldwork  showed 
that  these  expectations  could  not  be  realized,  at 
least  for  the  moment. 

On  the  one  hand,  the  response  rates  proved  to  be 
somewhat  lower.  Compared  with  other  German  sur- 
veys, there  will  be  less  drop-outs  because  of  house- 
holds that  were  not  reached,  which  shows  that  the 
fieldwork  was  much  more  intensive  than  usually.  But 
there  will  be  a  higher  proportion  of  refusals.  The  first 
findings  give  the  impression  that  the  still  virulent 
debates  about  data  privacy  had  a  negative  effect. 
Many  households  refused  with  the  arguments  'invasion 
of  privacy'  or  'this  is  another  kind  of  conducting  the 
census  cancelled  in  1983'. 

On  the  other  hand,  especially  for  sample  B,  there 
were  a  lot  of  addresses  unusable.  Addresses  regis- 
tered at  the  administrations  are  sometimes  absolete 
or  wrong.  With  foreigners,  this  is  a  common  problem: 


moved  persons  Bre  registered  at  the  new  place  of 
residence  but  not  deleted  at  the  administration  of  the 
previous  location. 

This  situation  required  to  add  further  addresses  to 
the  sample.  The  following  procedure  was  used:  For  all 
sample  points  of  sample  A,  where  not  enough  inter- 
views had  been  realized,  the  last  address  of  the 
primary  address  selection  was  used  as  a  new  starting 
address  and,  again,  every  seventh  household  was 
•elected.  For  sample  B,  20  addresses  had  been  drawn 
in  each  sample  point,  so  that  there  were  enough 
addresses  left  to  fill  up  sample  points. 

The  following  table  shows  some  interim  findings 
about  response  rates.  As  still  addresses  are  in  the 
field,  it  is  only  possible  to  give  an  estimate  of  the 
expected  results. 

It  is  obvious  that  sample  B  will  have  much  better 
response  rates  than  sample  A.  This  reflects  previous 
experiences  with  data  collection  on  foreigners.  It  is 
much  more  difficult  to  contact  foreigners  end  to 
overcome  their  general  suspiciousness.  But  if  this 
point  is  passed,  it  seems  to  be  easier  to  conduct  an 
interview  with  them.  A  possible  explanation  is  that 
foreigners  are  very  afraid  that  the  interviewer  might 
be  an  official  of  the  administration  which  possibly 
might  send  them  back.  If  this  problem  is  solved, 
foreigners  are  much  more  interested  in  talking  about 
their  life  and  Jess  conscious  of  data  protection  prob- 
lems. However,  response  rates  of  Italians  are  similar 
to  the  response  rates  of  Germans,  although  the 
Italians  are  the  best  integrated  group  of  the  guest- 
workers.  This  is  also  a  result  of  other  German  studies 
with  foreigners. 

Compared  with  the  results  of  other  panel  studies,  the 
interim  findings  for  the  first  wave  of  the  Socio  - 
Economic  Panel  are  satisfying.  Obviously,  the  first 
waves  are  the  most  difficult  to  conduct.  Although, 
the  Panel  Study  of  Income  Dynamics  has  very  high 
response  rates  in  later  waves,  in  the  first  wave  in 
1968,  it  was  only  76  %  (11),  (12).  The  Swedish  panel 
study  on  Household  Market  and  Nonmarket  Activities 
(14)  could  secure  response  rates  of  B5.2  %  with  the 
contact  interview  and  74.5  %  with  the  personal  visit, 
both  in  the  beginning  of  1984.  But  these  studies  are 
not  really  comparable,  since  in  the  Panel  Study  of 
Income  Dynamics  only  the  head  of  the  household  is 
interviewed,  while  in  the  Swedish  panel  mostly  head 
and  spouse  are  the  respondents  in  the  household,  and 
only  in  some  cases  a  third  household  member  will  be 
asked. 

The  Survey  of  Income  and  Program  Participation 
(SIPP)  has  a  very  high  response  rate  with  95.2  %  in 
the  first  wave  (15).  The  design  of  this  study  is  similar 
to  that  of  the  German  Socio  -  Economic  Panel.  All 
grown-up  household  members  are  respondents.  But 
there  is  an  important  difference  in  the  interviewing 
procedure.  Because  of  the  data  privacy  law  of  the 
Federal  Republic  of  Germany,  it  was  not  possible,  at 
least  in  the  first  wave  of  the  German  Socio  -Econom- 
ic Panel,  to  conduct  proxy  interviews.  In  conse- 
quence, only  self-interviews  were  allowed.  This  was 
an  extremly  difficult  situation,  especially  in  large 
households.  As  the  interviewers  had  to  contact  each 
household  member  until  they  got  an  interview  or  a 
refuse,  in  most  cases  several  household  contacts  were 
necessary.  Interpreting  the  response  rate,  this  means 
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that,  depending  on  the  subsample,  for  61  %  to  69  % 
of  all  households  self-interviews  with  all  household 
member  16  years  and  older  have  been  conducted.  In 
contrast  to  this,  the  SIPP  allowed  proxy  interviews, 
and  usually  the  interviewers  did  not  have  to  visit  the 
household  several  times  to  reach  each  respondent 
personally.  Tor  respondents  who  were  not  present  at 
the  time  of  the  interview,  a  proxy  interview  with 
another  household  member  was  conducted.  In  conse- 
quence, the  response  rate  of  the  SIPP  of  95.2  % 
includes  about  40  %  proxy  interviews  and  only  about 
60  %  self -interviews.  If  one  keeps  in  mind  this  differ- 
ence in  the  interviewing  procedure,  the  response 
rates  of  the  German  Socio  -  Economic  Panel  and  the 
SIPP  are  much  more  similar. 
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The  German  Socio  -  Economic  Panel 

Interim  findings  concerning  the  response  rates  of  the  first  wave 


1) 


Planned  net  sample  size 


Sample  A  Sample  B 

Greek  Italian         Jugoslavian     Spanish        Turkish 

4,500  200  300  300  200  400 


Interim  findings; 
Addresses  initially  used 
Addresses  added 
Total  addresses 


6,847 
1,139 
7,986 


312 


461 


467 


353 


629 


Still  in  the  field 

Addresses  already  worked  on 

Not  in  the  population 
or  wrong  addresses 
Net  addresses 


Accepted  interviews 
Nonresponse 

2) 
Expected  sample  size 

Expected  response  rates 


=2) 


4,425 
2,853 


4,545 
61% 


A  substantial  proportion  of  the  addresses  still  in  the  field  seem  to  be  no  valid  household  addresses.  In  many  of  these 
cases,  the  interviewer  as  well  as  the  telephone  contact  was  unable  to  reach  a  person.  A  final  controll  via  the  post 
office  being  conducted  presently  shows,  that  the  person  does  not  live  there  anymore,  that  it  is  a  business  address, 
etc.  Therefore,  the  total  expected  number  of  accepted  interviews  will  not  increase  according  to  the  interim 
figures,  but  the  number  of  net  addresses  will  be  somewhat  reduced.  However,  the  response  rates  will  remain  more 
or  less  unchanged. 
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HOUSEHOLD  MARKET  AND  NONMARKET  ACTIVITIES.  THE  FIRST  YEAR  OF  A  SWEDISH  PANEL  STUDY. 
N.  Anders  Klevmarken,  University  of  G5teborg,  Sweden. 


1 .   Purpose  and  scope. 

The  goal  of  the  project  Household  Mar- 
ket and  Nonmarket  Activities  (HUS)  is  to 
conduct  basic  research  on  the  economic  be- 
havior of  Swedish  households.  Economic  re- 
search has  long  been  concerned  with  how 
households  use  their  resources,  such  as 
the  days  per  year  spent  in  gainful  employ- 
ment, the  time  spent  in  housework  and  lei- 
sure, the  proportions  of  income  which  are 
respectively  saved  and  spent  and  which 
goods  and  services  the  households  use. 
All  of  this  has  direct  importance  for  our 
welfare  and  for  the  economic  development 
of  the  countries.  However,  in  order  to  ob- 
tain additional  results  with  this  research 
we  need  better  data  about  households.  For 
instance,  compared  to  its  share  of  GNP  the 
resources  used  to  produce  data  about  the 
household  sector  are  small. 

There  is  a  large  need  to  know  how 
households  adjust  themselves  to  changes 
in  their  economic  situation  both  when  it 
improves  and  when  it  declines.  Some  ex- 
amples of  such  changes  are  new  taxation 
and  welfare  regulations,  changes  in  muni- 
cipal daycare,  new  forms  of  savings  and 
new  economic  conditions  for  various  kinds 
of  housing.  Cross-sectional  data  do  not 
have  much  to  offer  for  such  an  analysis, 
we  do  need  longitudinal  data. 

Distributional  issues  have  usually 
been  analysed  with  cross-sectional  data, 
but  again,  for  the  most  interesting  issues 
as,  for  instance,  the  duration  of  poverty, 
longitudinal  data  are  needed. 

Compared  to  previous  research  we  want 
to  attain  above  all  three  improvements 
with  our  project.  The  first  is  to  study 
the  entire  household.  Economic  decisions 
are  frequently  a  family  matter.  Individ- 
ual decisions  are  influenced  by  the  fam- 
ily situation  and  by  other  household  mem- 
bers, and  there  are  joint  decisions  about 
children,  work,  housing  and  leisure.  The 
living  standard  and  well-being  of  an  in- 
dividual depend  very  much  on  the  house- 
hold to  which  this  person  belong.  For 
these  reasons,  we  think  it  is  essential  to 
have  data  about  the  entire  household  to 
study  people's  economic  activities  and 
living  standards  in  a  realistic  way. 

Secondly,  we  intend  to  collect  infor- 
mation which  gives  a  complete  picture  of 
the  household  member's  economic  situation 
and  planning.  Then  we  can  study  how  house- 
holds' and  their  members'  various  activi- 
ties such  as  gainful  employment,  housing, 
recreation,  consumption  and  savings  be- 
havior depend  upon  each  other  and  inter- 
act, and  how  they  are  affected  by  national 
and  municipal  government  policies.  An  ex- 
ample is  how  the  supply  of  day-care  can 
affect  employment  and  leisure  activities 
for  both  spouses  in  the  family.  We  thus 
need  data  which  cover  all  these  aspects 


of  a  household's  economic  activities. 

Thirdly,  we  want  to  have  longitudi- 
nal household  data  which  will  improve  our 
understanding  of  households  adjustments 
to  economic  policy,  changes  in  their  eco- 
nomic behavior  and  the  duration  of  eco- 
nomic states. 

The  HUS-project  will  give  new  con- 
tributions in  the  following  respects: 

-  it  gives  a  total  picture  of  the 
household's  employment,  purchases, 
savings,  household  and  leisure  acti- 
vities , 

-  it  will  be  the  first  investigation  in 
Sweden  in  which  time-use  by  house- 
holds can  be  studied  from  the  econo- 
mic perspective, 

-  it  gives  significantly  improved  capa- 
bility in  studying  how  schooling, 
work  experience,  wages,  family  situa- 
tion and  childcare  affect  the  level 
of  employment  in  the  household, 

-  it  provides  new  potential  in  studying 
role  division  in  couples  with  regard 
to  employment,  housework  and  raising 
children, 

-  it  will  provide  previously  unknown 
information  on  how  much  time  house- 
holds spend  on  the  maintenance  of 
their  housing,  cars,  boats,  etc., 

-  it  makes  it  possible  to  study  which 
factors  are  important  in  the  choice 
of  type  of  housing, 

-  it  makes  it  possible  to  study  how 
economic  welfare  in  the  form  of  con- 
sumption standards  and  wealth  are 
distributed  among  households. 

A  more  detailed  description  of  our 
problems  is  found  in  the  original  research 
program,  Eliasson  &  Klevmarken  (1981). 

The  HUS-project  is  a  joint  operation 
of  researchers  from  the  University  of 
Gothenburg,  The  Industrial  Institute  for 
Economic  and  Social  Research  (IUI) ,  Stock- 
holm and  the  Stockholm  School  of  Econo- 
mics. 

2.   Population,  general  design  and 
data  collected. 

The  first  wave  of  the  HUS-study  aims 
at  the  population  of  households  residing 
in  Sweden  by  the  end  of  January  1984.  In- 
dividuals living  in  institutions  who  did 
not  form  their  own  household  and  did  not 
prepare  their  own  meals  were  excluded. 
Because  it  is  expensive  to  interview  old 
people  and  the  response  rate  would  have 
become  low  households  with  very  old  mem- 
bers were  also  excluded. 

The  units  of  analysis  are  both  the 
household  and  the  individual  household 
member.  The  definition  of  a  household  re- 
sembles that  normally  used  in  consumer 
expenditure  surveys.  Those  who  live  in 
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the  same  dwelling  and  regularly  have 
meals  together  belong  to  the  same  house- 
hold. Family  members  who  temporarily 
live  somewhere  else  are  also  included. 
Since  each  household  will  become  inter- 
viewed three  times  during  1984  we  will 
experience  household  splits  and  also  new 
members  joining  and  old  household.  The 
rules  adopted  for  these  situations  were 
to  follow  all  individuals  selected  for 
an  interview  in  the  first  contact  with 
the  household  unless  they  moved  abroad 
or  into  an  institution.  New  household 
members  are  not  interviewed. 

Since  there  is  no  sampling  frame  of 
households  or  dwellings  but  well  a  reg- 
ister of  all  residents  of  Sweden,  each 
household  was  identified  through  a  ran- 
domly selected  individual.  The  house- 
hold to  which  this  individual  belonged 
was  included  in  our  sample.  It  was,  how- 
ever, not  feasible  to  interview  all 
household  members.  Instead  we  decided  on 
a  scheme  where  the  head  of  the  house- 
hold and  his  spouse  were  always  inter- 
viewed. 

If  the  randomly  selected  person  was 
neither  of  these  two,  this  third  person 
was  interviewed  in  addition.  In  this  way 
we  could  ascertain  some  information  about 
other  adults  in  the  household  and  also 
get  a  "clean"  random  sample  of  designa- 
ted persons.  Individuals  below  18  years 
of  age  and  above  74  i.e.  born  before  1910 
or  after  1966  were  excluded  from  the  sam- 
pling frame.  The  randomly  selected  per- 
sons were  thus  all  in  the  age  bracket 
18-74,  however  other  household  members 
need  not  be.  Children  were  not  inter- 
viewed. 

Each  randomly  selected  person  received 
an  introductory  letter,  which  was  fol- 
lowed by  a  telephone  contact.  In  this 
first  contact  the  interviewer  informed 
about  the  survey,  asked  for  the  name  and 
age  of  each  adult  household  member  and 
their  family  relations.  Finally,  the  in- 
terviewer also  booked  a  time  for  a  per- 
sonal interview  with  each  respondent.  On 
the  basis  of  the  information  obtained  in 
this  short  interview  one  household  member 
was  designated  household  head.  In  a 
household  with  both  spouses  living  to- 
gether the  husband  was  called  head.  Much 
of  the  information  collected  concern  eco- 
nomic facts  about  the  household  which  the 
husband  on  the  average  is  expected  to 
know  more  about  than  his  wife.  In  house- 
holds with  two  or  more  adult  household 
members  but  without  a  married  or  cohabi- 
ting couple  the  person  with  the  highest 
income  was  the  designated  head.  Who  was 
called  head  was  never  communicated  to  the 
respondents,  but  only  used  to  decide  who 
would  get  questions  about  housing  and 
other  issues  not  particular  to  any  single 
household  member. 

The  second  step  in  the  field  work  was 
a  personal  interview  with  each  respondent, 
i.e.  a  maximum  of  three  per  household. 


This  interview  was  planned  for  an  aver- 
age interviewing  time  of  60  minutes  for 
the  head  and  45  minutes  for  other  adults. 
The  questionnaire  included  the  following 
sections: 

1 .  Family  composition 

2.  Social  background 

3.  Schooling 

4.  Marital  status 

5.  Childcare 

6.  Health  status 

7.  Labor  market  experience 

8.  Employment 

9.  Job  search  of  unemployed 

10.  Not  in  the  labor  force 

1 1 .  Housing  and  housing  costs 

12.  Tenants 

13.  Real  estate  ownship 

14.  Cars 

15.  Boats 

16.  Other  durable  consumer  goods 

17.  Incomes  and  assets. 

The  information  collected  in  this 
personal  interview  included  economic  de- 
tails about  the  household  like  housing 
expenditures,  mortages  and  interest  pay- 
ments, and  various  income  items  and  as- 
sets which  most  people  would  not  be  able 
to  give  without  consulting  note,  bills 
and  taxforms.  Some  respondents  might  also 
hesitate  to  reveal  these  data  because  of 
their  sensitivity.  For  these  reasons  the 
respondents  were  asked  to  give  this  in- 
formation in  wrighting  on  a  questionnaire 
which  they  put  into  an  envelope  and 
sealed  before  it  was  handed  over  to  the 
interviewer.  The  interviewers  were  in- 
structed to  interview  other  household 
members  while  waiting  for  the  question- 
naire. In  this  way  no  sensitive  informa- 
tion was  revealed  to  the  interviewer. 

We  had  originally  planned  to  obtain 
most  of  the  information  about  incomes, 
transfer  payments  and  personal  wealth 
from  government  data  registers  via  Statis- 
tics Sweden.  The  government  gave  us  a  per- 
mit to  copy  data  from  the  tax  assessment 
form  of  each  respondent.  However,  to  get 
access  to  these  files  we  needed  the  so- 
cial security  number  of  each  respondent. 
The  Datainspection  Board  also  required 
that  we  obtain  the  consent  of  each  res- 
pondent to  use  register  data.  In  a  pre- 
test it  soon  became  clear  that  it  was 
very  difficult  to  get  the  social  security 
numbers.  In  Sweden  the  most  commonly  used 
person-id  in  public  and  private  data 
files  is  the  social  security  number,  and 
the  public  debate  about  computers  and  in- 
vation  of  personal  privacy  made  respon- 
dents very  reluctant  to  reveal  their  so- 
cial security  number.  To  investigate  this 
further  some  500  randomly  selected  res- 
pondents (not  included  in  our  HUS-sample) 
were  asked  if  they  preferred  to  give  their 
social  security  number  or  would  rather 
give  the  information  we  asked  for  direct- 
ly in  a  questionnaire.  About  27  per  cent 
answered  that  they  were  willing  to  reveal 
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their  social  security  number  while  65  per 
cent  preferred  to  give  it  directly,  5  per 
cent  spontaneously  refused  to  do  either 
while  3  per  cent  did  not  know.  To  avoid 
a  high  nonresponse  we  thus  had  to  change 
our  plans  and  it  was  decided  that  each 
respondent  would  have  a  choice,  either  to 
answer  questions  about  income,  assets 
etc.  directly  in  writing,  without  reveal- 
ing it  to  the  interviewer,  or  to  give  the 
social  security  number  and  not  answer  the 
income  questions  etc. 

A  disadvantage  with  this  scheme  is 
that  the  data  obtained  from  the  question- 
naires might  not  be  fully  comparable  to 
the  taxfile  data.  To  minimize  this  prob- 
lem direct  reference  was  made  in  the 
questionnaire  to  the  items  of  the  tax 
assessment.  Most  of  the  field  work  was 
also  done  immediately  after  the  tax  as- 
sessment forms  were  submitted  to  the 
authorities.  In  our  judgement  this  prob- 
lem of  comparability  is  minor  compared 
to  the  nonresponse  problem. 

The  possibility  to  compare  the  in- 
come and  wealth  estimates  from  register 
and  interview  data  with  the  corresponding 
population  totals  can  be  used  for  non- 
response  compensation  and  model  valida- 
tion, which  to  some  extent  makes  the  non- 
response  problem  less  severe. 

In  addition  to  the  personal  inter- 
view, the  respondents  were  contacted 
twice  for  two  telephone  interviews  about 
their  time-use  and  consumption  expendi- 
tures. The  method  used  was  an  adaptation 
of  the  yesterday  question  technique  pre- 
viously used  at  the  ISR,  the  University 
of  Michigan.  It  is  perhaps  best  described 
as  an  one  day  retrospective  interviewer 
administered  diary.  The  basic  idea  is 
that  the  interviewer  goes  through  the 
past  24  hours  with  the  respondent  and 
asks  him  or  her  to  recall  for  each  ac- 
tivity, when  it  started  and  ended  and  if 
the  respondent  made  any  expenditures  at 
the  same  time.  In  addition  to  their  time- 
use  and  expenditures  the  respondents  were 
in  each  interview  asked  about  their  labor 
market  status.  In  the  last  interview  we 
also  asked  about  purchases  of  durables 
since  the  beginning  of  the  year,  and  how 
frequently  they  had  used  certain  public 
services. 

For  each  household  two  days  were  ran- 
domly selected  from  the  365  days  follow- 
ing the  15th  of  February  1984.  Ideally 
the  time-use  estimates  should  cover  a 
calendar  year  to  match  income  and  wealth 
data,  but  all  Swedes  are  busy  with  their 
tax  assessment  forms  in  the  end  of  Janu- 
ary and  beginning  of  February  and  for 
this  reason  we  wished  to  minimize  our 
field  work  during  this  period  and  decided 
to  start  collecting  time-use  data  when 
the  tax  assessment  forms  had  been  sub- 
mitted by  February  15. 

A  pilot  study  showed  that  designated 
dates  for  interviews  caused  a  relatively 
high  nonresponse  because  the  respondents 


were  not  always  available  for  an  inter- 
view on  the  selected  days.  Thus,  for  each 
designated  date  there  were  also  two  al- 
ternative dates  which  could  be  used  if  a 
contact  could  not  be  reached  on  the  first 
day.  These  alternative  dates  were  selec- 
ted on  the  same  week  day    one  and  two 
weeks  respectively  after  the  designated 
date. 

The  interviewers  were  told  to  contact 
the  respondents  on  the  day  following  the 
designated  day.  If  they  could  not  get  an 
interview  they  should  try  the  next  day 
and  the  next  day  again.  If  they  still 
were  unsuccessful  they  should  repeat  the 
same  scheme  for  the  first  and  second  al- 
ternative day.  They  were  not  allowed  to 
conduct  interviews  with  a  longer  memory 
time  span  than  for  three  days. 

The  personal  interview  should  in  gen- 
eral preceed  the  two  telephone  inter- 
views. This  plan  was  chosen  because  it  is 
easier  to  explain  a  survey  in  person  than 
by  telephone  and  in  this  way  the  inter- 
viewer would  not  be  a  complete  stranger 
to  the  respondents  in  the  time-use  inter- 
views. For  practical  reasons  we  were, 
however,  not  always  able  to  follow  this 
plan.  Some  telephone  interviews  had  to 
be  made  from  the  telephone  unit  of  the 
survey  institute  SIFO  in  Stockholm.  A  few 
were  also  made  before  the  personal  inter- 
views. 

3.   Pretest  experiences. 

In  April  and  May  of  1982  we  made  a 
rather  extensive  pilot  study  based  on  a 
random  sample  of  315  households  from 
Western  Sweden.  There  were  five  main  pur- 
poses of  this  study,  namely,  to 

a)  compare  different  methods  of  collect- 
expenditure  and  time-use  data, 

b)  get  estimates  of  response  rates  and 
an  idea  of  what  might  be  important 
for  the  response, 

c)  test  the  questionnaires, 

d)  develop  coding  and  editing  procedures, 

e)  train  the  project  staff  in  the  entire 
survey  operation. 

The  results  from  this  pilot  study  have 
been  reported  in  Klevmarken  (1982,  1983). 
Here  follows  only  a  very  brief  summary. 

There  were  altogether  three  contacts 
with  each  household.  The  first  one  was 
a  short  contact  interview  by  telephone 
with  a  randomly  selected  person  to  estab- 
lish the  household  composition  and  to  ask 
a  few  demographic  questions.  Then  two  in- 
terviews followed  with  each  respondent  in 
each  household.  The  same  rules  for  desig- 
nated respondents  were  used  as  explained 
above.  One  interview  was  personal  and  one 
was  made  by  telephone.  In  addition,  leave 
behind  expenditure  diaries  were  adminis- 
tered to  each  respondent  and  leave  behind 
time-use  diaries  to  a  few  respondents. 

The  response  rate  in  the  major  con- 
tacts was  as  low  as  50-55  per  cent,  which 
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is  much  lower  than  we  would  find  accept- 
able in  a  main  study.  In  short,  we  as- 
cribe this  result  at  least  partly  to  the 
ambitions  design,  the  short  timespan  dur- 
ing which  the  field  work  had  to  be  done 
and  the  budget  constraints,  which  did  not 
permit  paying  the  respondents  nor  per- 
mitted expensive  nonresponse  follow  ups. 
Our  conclusion  was  that  improvements  in 
the  design  and  use  of  response  stimulat- 
ing measures  should  make  it  possible  to 
increase  the  response  rate. 

Our  nonresponse  analysis  gave  the 
following  additional  results: 

o   The  initial  nonresponse  was  rather 
high.  This  was  probably  the  combined 
effect  of  the  following  features: 
(i)    The  survey  was  introduced  by 
telephone  rather  than  in  a 
personal  visit, 
(ii)   In  this  telephone  interview 
we  asked  for  family  composi- 
tion and  previous  marriages 
and  living  arrangements  which 
some  respondents  might  have 
found  invasive, 
(iii)  When  the  interviewer  concluded 
the  interview  by  explaining  th~ 
design  of  the  study  many  res- 
pondents found  the  work  load 
too  high.  This  shows  that  the 
first  interview  should  be  in 
person  and  the  telephone  con- 
tact preceeding  it  should  not 
be  used  to  ask  questions,  only 
to  make  arrangements  for  the 
first  interview. 

o   A  major  drop  in  the  response  rate 
also  occurred  immediately  after  the 
contact  interview,  i.e.  many  res- 
pondents refused  to  keep  an  expendi- 
ture diary.  Leave  behind  diaries  tend 
to  increase  nonresponse.  In  this  case 
a  better  result  might  have  been  ob- 
tained if  the  relative  simplicity  of 
the  diary  had  been  demonstrated  by 
the  interviewer  in  a  personal  visit. 
In  the  pilot  study  the  diary  was  ex- 
plained in  the  initial  telephone  con- 
tact and  then  mailed  to  the  respond- 
ents. 

o   Old  respondents  showed  a  relatively 
high  nonresponse  in  those  parts  of 
the  survey  which  involved  relatively 
more  work,  i.e.  diaries  and  long  in- 
terviews about  time-use.  For  this 
reason  we  decided  not  to  include  very 
old  persons  in  the  main  survey. 


There  was  no  indication  of  a  strong 
relationship  between  nonresponse  and 
income  or  socioeconomic  group. 


o  Refusals  made  up  a  very  large  share 
of  the  nonresponse.  This  indicated 
that  we  would  have  to  do  a  much  bet- 
ter job  in  explaining  the  importance 
of  the  survey  and  also  provide  some 
personal  stimulus  to  obtain  a  better 
cooperation. 

Results  from  tests  of  alternative 
data  collection  methods  can  be  summarized 
in  the  following  way. 

For  almost  all  commodities  the  yes- 
terday question  technique  gave  smaller 
estimates  of  average  expenditures  than 
leave  behind  diaries.  Since  we  have  no 
reason  to  expect  that  leave  behind  dia- 
ries would  give  overestimates  this  result 
shows  that  yesterday  questions  in  the 
form  used  in  the  pilot  study  tend  to  un- 
derestimate househould  expenditures. 
However,  in  the  main  study  we  have  im- 
proved the  methodology  by  deleting  a 
few  supplementary  questions  for  each  ac- 
tivity, by  adding  after  the  time-use 
sequence  a  few  questions  about  expendi- 
tures previously  not  mentioned  and  by 
giving  stricter  rules  for  how  the  ques- 
tions should  be  asked.  We  thus  hope  that 
the  underreporting  problem  is  reduced. 

Even  if  the  yesterday  questions  will 
not  give  systematic  errors,  expenditures 
recorded  only  for  a  few  days  for  each 
respondent  give  unreliable  estimates.  If 
the  shopping  pattern  during  the  week  is 
approximately  the  same  for  all  commodi- 
ties then  it  might  be  possible  to  adjust 
the  sampling  design  to  this  pattern  and 
in  this  way  increase  the  efficiency  some- 
what. It  is,  however,  not  likely  that 
this  gain  in  efficiency  would  become  so 
high  that  a  longer  observation  period  for 
less  frequent  purchases  would  not  be 
needed.  In  the  main  study  we  have  thus 
supplemented  the  last  time-use  interview 
with  questions  about  purchases  of  major 
durables. 

The  yesterday  question  technique  to 
collect  time-use  information  has  worked 
relatively  well  once  the  interviewers  got 
used  to  it.  The  time-use  questionnaire 
requires  much  more  training  than  a  tradi- 
tional interview  briefing  gives.  The 
pilot  study  did  not  include  a  comparison 
with  the  closest  alternative,  a  self  ad- 
ministered leave  behind  diary,  but  a  com- 
parative evaluation  of  these  two  methods 
would  be  useful  for  future  data  collec- 
tion. All  we  have  been  able  to  do  so  far 
is  to  compare  estimates  of  time-use  in 
aggregate  activities  for  the  United 
States  and  Finland  with  our  own  estimates. 
There  is  a  striking  similarity  in  the 
time-use  pattern  between  the  three  coun- 
tries (Flood,  1983).  We  have  also  com- 
pared the  response  to  yesterday  questions 
with  that  of  retrospective  questions  cov- 
ering two  weeks.  Similar  to  results  from 
other  studies  we  found  that  retrospective 
questions  for  a  longer  period  tend  to 
give  systematic  errors.  Time-use  for  less 
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frequent  activities  was  underreported 
compared  to  the  results  from  yesterday 
questions. 

The  pilot  study  also  included  a  com- 
parison between  estimates  of  time  off 
work  at  work  from  the  time-use  diary  and 
from  direct  questions  about  time  normal- 
ly used  for  meals  and  coffee  breaks, 
personal  errands  etc.  Longer  hours  was 
on  the  average  reported  for  "normal" 
time  off  work  at  work  than  for  the  cor- 
responding activities  from  the  yester- 
day question  diary. 

If  telephone  interviews  could  be  used 
instead  of  personal  interviews  was  an- 
other issue  investigated  in  the  pilot 
study.  Our  experiences  show  that  a  dif- 
ficult and  demanding  study  like  ours 
should  be  introduced  to  the  respondents 
in  person.  If  not,  the  nonresponse  rate 
is  likely  to  increase.  For  respondents 
we  could,  however,  find  no  significant 
difference  in  time-use  or  expenditures 
between  interviews  made  in  person  and 
those  made  by  telephone. 

4.   Sample  design. 

The  sample  was  obtained  by  a  two- 
stage  cluster  design.  The  clusters  were 
stratified  by  a  rather  unconventional 
procedure.  First,  Sweden  was  divided  into 
nine  major  geographical  areas.  Within 
each  area  a  cluster  analysis  was  run  on 
the  zipcode  areas  to  group  them  into 
strata.  For  each  zipcode,  variables  were 
used  measuring  the  age,  income  and  occu- 
pational distributions,  the  share  of  the 
population  living  in  owner  occupied 
houses  and  the  share  of  foreigners.  The 
zipcodes  were  stratified  in  four  to  eight 
strata  depending  on  geographical  area. 

The  primary  selection  unit  was, 
however,  not  the  zipcode  but  commune  or 
more  precisely,  those  zipcodes  which  be- 
long to  a  commune  within  a  stratum.  We 
preferred  to  use  this  primary  selection 
unit  rather  than  zipcode  to  reduce  the 
geographical  dispersion  of  the  sample. 
If  we  had  used  zipcode  we  would  have 
needed  more  interviewers  than  we  could 
possibly  recruite  and  train.  From  each 
stratum  two  primary  units  were  selected. 

In  the  second  sampling  stage  a  num- 
ber of  individuals  were  chosen  from  each 
primary  unit  of  zipcode  areas.  As  a  sam- 
pling frame  we  used  the  SPAR  register, 
which  is  a  register  of  all  residents  of 
Sweden.  We  were  aiming  at  a  self-weighted 
sample  with  about  10  individuals  in  each 
primary  unit.  With  a  sample  size  of  2200 
this  required  a  little  less  than  200  in- 
terviewers, which  approximately  is  the 
size  of  the  interviewer  staff  of  the  SIFO 
institute  which  did  our  fieldwork. 

In  practice  it  was  not  possible  to 
get  a  completely  self-weighted  sample. 
The  population  figures  of  the  SPAR  reg- 
ister were  about  half  a  year  old  and  the 
effective  sample  size  of  each  primary 
selection  unit  deviated  a  little  from  10. 


In  all  we  obtained  2131  individuals.  We 
thus  used  a  sample  of  2131  households  and 
expected  to  interview  on  the  average  1.65 
respondents  per  household,  totaling  ap- 
proximately 3500  respondents. 

Our  budget  did  not  permit  a  larger 
sample  although  it  would  have  been  de- 
sirable to  increase  its  size.  Our  possi- 
bilities to  analyze  subgroups  of  house- 
holds will  now  become  somewhat  limited. 
For  instance  we  will  get  relatively  few 
unemployed,  academicly  trained  or  wealthy 
people. 

To  make  feasible  an  inference  to  the 
annual  time-use  of  the  Swedish  population 
we  decided  to  use  a  random  day  design. 
This  design  should  ideally  take  advantage 
of  seasonal  and  weekly  variations  in 
time-use.  Efficiency  calculations  made 
for  the  time-use  studies  at  ISR,  the 
University  of  Michigan  (Karlton  [1983]) 
indicate  that  a  sample  of  two  days  might 
be  sufficient.  The  marginal  increase  in 
efficiency  of  additional  days  decrease 
rapidly.  If  it  is  desirable  to  draw  con- 
clusions about  a  particular  season  or, 
for  instance,  calculate  within  individual 
variance  estimates  for  weekdays,  more 
than  two  days  are  needed.  Budget  consid- 
erations, however,  limited  our  alterna- 
tives considerably.  We  could  not  afford 
more  than  a  sample  of  two  days  for  each 
respondent.  Given  this  constraint,  dif- 
ferences in  time-use  between  seasons  and 
between  workdays  and  holidays  were  built 
into  the  design  in  the  following  way. 

The  calendar  year  was  split  into  a 
winter  and  a  summer  season  and  the  days 
of  a  week  were  divided  into  workdays  and 
weekends  or  holidays.  The  365  days  of  a 
year  were  thus  grouped  into  four  strata. 


Workday 

Weekend 
&  holidays 

Winter  season 

A 

B 

Summer  season 

B 

A 

The  sample  of  households  were  random- 
ly divided  into  two  halves  A  and  B.  For 
each  household  in  group  A  a  winterseason 
workday  and  a  summer  season  Saturday, 
Sunday  or  holiday  were  randomly  drawn  and 
for  each  household  in  group  B  a  winter- 
season  Saturday,  Sunday  or  holiday  and  a 
summer  season  workday.  In  each  stratum 
days  were  drawn  with  equal  probabilities 
and  without  replacement. 

We  also  tried  to  balance  the  sample 
of  days  with  respect  to  the  strata  of 
clusters  and  to  the  two  selected  primary 
units  from  each  stratum. 
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5.  Fieldwork  considerations. 

In  addition  to  the  normal  interview- 
er training  which  SIFO  provides  its 
interviewers  each  interviewer  had  to 
participate  in  an  one  day  training  ses- 
sion particular  to  the  HUS-project.  In 
all,  about  20  sessions  were  hold  around 
the  country.  During  these  sessions  the 
interviewers  were  informed  about  the  pur- 
pose and  scope  of  the  project  and  about 
the  details  of  the  field  work  procedure. 
They  also  practiced  with  the  HUS-ques- 
tionnaires.  Before  the  field  work  started 
each  interviewer  also  had  to  do  testin- 
terviews  in  the  field  with  the  question- 
naires for  the  head  of  the  household. 
These  questionnaires  were  sent  in  to 
SIFO,  reviewed  and  corrected  and  re- 
turned to  the  interviewers.  Those  inter- 
viewers who  were  not  able  to  produce 
satisfactory  testinterviews  did  not  par- 
ticipate further. 

From  our  pretests  we  know  that  non- 
response  is  our  major  problem.  Since  the 
present  study  is  to  become  the  first 
wave  of  a  future  panel  study  it  is  essen- 
tial to  have  a  high  response  rate.  For 
this  reason  it  was  decided  to  give  the 
respondents  some  kind  of  renumeration. 
Our  budget  did  not  allow  for  cash  pay- 
ments but  we  have  tried  a  combination 
of  gifts  and  a  lottery.  At  the  and  of  the 
personal  interview  the  interviewer  gave 
the  respondent  a  list  of  a  few  small 
gifts,  for  instance,  one  item  was  a  choc- 
olate box.  The  respondent  could  either 
chose  one  of  the  gifts  from  this  list  or 
participate  in  a  lottery  to  win  a  flight 
ticket  for  two  persons  to  Paris.  This 
design  does  not  only  serve  the  purpose 
to  reduce  nonresponse,  but  it  will  also 
give  interesting  data  about  peoples 
choice  behaviour  under  uncertainty.  Each 
respondent  will  also  receive  a  summary  of 
the  main  results  from  the  survey. 

6 .  Response. 

When  this  is  written  the  fieldwork 
with  the  contact  interviews  is  completed 
and  only  three  personal  interviews  re- 
main, while  most  of  the  telephone  inter- 
views still  have  to  be  done.  Table  1 
shows  the  field  work  logg  by  the  19th  of 
June  which  exhibits  the  nonresponse  by 
reason  as  of  this  date. 

No  interviews  were  attempted  with 
those  labled  "Not  in  the  population". 
The  response  rate  for  attempted  contact 
interviews  is  75.5  per  cent  and  for  at- 
tempted personal  interviews  74.6  per 
cent.  Nonresponse  is  almost  entirely  due 
to  refusals.  One  might  also  note  that 
there  was  almost  no  increase  in  nonres- 
ponse in  the  telephone  interviews  follow- 
ing the  personal  visit. 


7 .   Concluding  remarks. 

After  the  first  year  of  fieldwork  we 
will  have  data  about  labour  market  status 
from  three  different  interviews  for  each 
responding  household  member.  This  will 
become  our  first  piece  of  longitudinal 
information.  Our  plans  are  to  return  to 
the  same  households  in  the  beginning  of 
1986  to  obtain  data  on  changes  in  house- 
hold composition,  and  new  data  on  hous- 
ing, labour  force  participation,  earn- 
ings, incomes  and  assets.  New  time-use 
and  expenditure  data  will  probably  not 
be  collected  until  later. 

Footnotes. 

1 )  If  the  designated  day  was  a  workday 
(holiday)  but  any  of  the  correspond- 
ing weekdays  one  and  two  weeks  later 
was  a  holiday  (workday)  the  closest 
following  workday  (holiday)  was 
chosen  as  an  alternative  day. 
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Table  1 .  Non-response  by  reason  August  20, 


Contact 

Personal 

Telephone 

interview 

visit 

interview 
I      II 

Not  in_the_population 

Living  abroad 

21 

27 

Moved,  address  unknown 

35 

41 

Not  Swedish  speaking 

25 

55 

Unhealthy,  institution 

16 

22 

at  home 

23 

37 

Other 

20 

42 

Subtotal  140  224        230    221 

Nonresgonse 

Refusal,  personal  integrity  29  54 

too  busy  31  67 

"     never  participates  17  33 

participated  before  19  37 

"      interview  workload 

too  high  12  25 
"     not  interested  35  57 
"     after  attempted  per- 
sonal persuasion  165  285 
other  123  275 
Not  available  57  63 
Other  0  0 

Subtotal  488  896       886    867 

(of  which  additional  non-  (29)   (  7) 
response  in  the  telephone 
interviews) 

Accepted_interviews  1504  2635       1758    396 

Still  in  the  field  0  0       881   2271 


Total  sample_size  2132        3755      3755   3755 

Response  rate  (%)  7  5.5        7  4.6 

Note  1 .  The  first  column  shows  the  number  of  randomly 
selected  persons  which  equals  the  number  of 
households.  The  last  three  columns  show  the 
number  of  individuals  designated  to  participate. 
For  those  households  which  did  not  complete  the 
contact  interview  the  number  of  designated  respond- 
ents is  unknown.  These  households  have  in  columns 
2-4  been  enumerated  as  single  person  households. 

Note  2.  The  response  rate  in  the  last  row  of  the  table 
is  the  ratio  between  the  number  of  accepted 
interviews  and  the  total  sample  size  less 
those  not  in  the  population. 
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THE  AUSTRALIAN  NATIONAL  LONGITUDINAL  SURVEY 
Ian  McRae,  Bureau  of  Labour  Market  Research 


1 .  Introduction 

The  Australian  National  Longitudinal 
Survey  (ANLS)  is  essentially  a  survey  of 
the  labour  market,  and  the  data  it 
collects  will  focus  principally  on 
explaining  the  workings  of  the  Australian 
labour  market.  The  survey  is  based  on  a 
large  probability  sample.  It  will  collect 
data  which  relates  a  variety  of  areas  not 
usually  covered  in  a  single  collection. 
This,  in  conjunction  with  its 
longitudinal  nature,  will  make  it  a  very 
valuable  source  of  data  for  Australian 
researchers  and  policy  advisers. 

When  the  survey  is  in  full  operation 
it  will  include  12  000  people  selected 
from  two  sampling  frames.  Of  these  3  000 
will  be  selected  from  a  list  of 
registered  unemployed  and  9  000  from  an 
area  based  sample.  Data  will  be  collected 
on  a  wide  variety  of  issues,  but  with  a 
main  focus  on  the  labour  market  and 
unemployment.  The  scope  of  the  survey  is 
restricted  to  people  aged  15-24  in  1984, 
in  order  to  allow  a  detailed  analysis  of 
this  group. 

This  paper  has  been  structured  as 
follows.  Section  2  briefly  discusses  the 
background  to  the  decision  to  conduct  a 
longitudinal  survey.  Section  3  describes 
how  decisions  regarding  broad  design 
issues  were  taken,  based  on  the  principal 
objectives  of  the  survey.  Section  4 
describes  some  of  the  secondary 
objectives,  which  include  the  use  of  the 
data  in  evaluation  of  Government  labour 
market  programs.  Evaluation  is  discussed 
in  more  detail  in  Section  5.  Section  6 
outlines  the  data  required  to  meet  these 
objectives.  Sections  7,  8  and  9  describe 
the  sample  design  and  estimation 
procedures  proposed  for  the  survey. 
Section  10  notes  briefly  some  practical 
aspects  of  data  collection,  and  Section 
11  makes  brief  comment  on  the  methods  of 
analysis  likely  to  be  used. 

2.  Background 

In  January  1983  the  Australian  Government 
was  considering  a  range  of  means  of 
assisting  the  long  term  unemployed.  In 
the  course  of  the  deliberations  they 
became  aware,  not  only  of  the  dearth 
of  information  relating  to  this  group, 
but  that  certain  key  issues  could  not  be 
resolved  from  the  traditional  cross- 
sectional  data  available.  Subsequently 
funds  were  allocated  for  the  conduct  of  a 
longitudinal  survey  of  at  least  3  years 
with  the  primary  purpose  of  learning  more 
about  the  long-term  unemployed,  and  the 
Bureau  of  Labour  Market  Research  (BLMR) 
was  asked  to  conduct  the  survey. 
It  should  be  pointed  out  that 
Australia  is  generally  very  well  supplied 
with  labour  market  data.  The  government 
statistical  agency,  the  Australian  Bureau 


of  Statistics  (ABS)  conducts  a  monthly 
labour  force  survey  of  a  national  sample 
of  32  000  households.  This  survey 
encompasses  all  the  conventional  labour 
market  data,  and  for  8  or  9  months  of  the 
year  includes  a  supplementary  survey.  The 
supplementary  surveys  cover  a  wide 
variety  of  labour  market  and  social 
issues,  including  alternative  working 
arrangements,  child  care,  labour 
mobility,  labour  force  experience,  income 
and  working  conditions. 

In  addition  to  the  monthly  survey, 
ABS  conducts  a  series  of  Special 
Supplementary  Surveys.  These  surveys 
cover,  for  example,  household 
expenditure,  housing,  health  and 
handicapped  persons.  They  tend  to  be  much 
larger  and  more  complex  than  the 
supplements  to  the  monthly  survey  and 
they  may  be  used  to  focus  on  particular 
sub-groups  within  the  population. 

The  monthly  labour  force  survey  is 
not  well  suited  to  longitudinal  analysis. 
Month  to  month  gross  flows  data  are 
published.  However  the  mobility  of  the 
population  is  such  that  attrition  is 
quite  high  (10%  of  the  population  cannot 
be  matched  from  month  to  month),  and  the 
rotation  pattern  is  such  that  no  one  is 
in  sample  for  more  than  8  months. 

There  have  been  a  number  of 
longitudinal  surveys  conducted  in 
Australia,  but  none  on  a  large  scale,  in 
the  sense  of  covering  large  numbers  of 
both  people  and  data  items.  Most  of  the 
more  prominent  studies  have  been  based  on 
school  leavers,  with  an  initial  interview 
at  school,  and  mail  follow  ups  at 
different  periods  (see  Blandy  and 
Richardson  1982,  Williams  1981,  and 
Dowling  and  O'Brien  1979).  These  surveys 
have  been  extremely  valuable,  but  it  has 
not  been  possible  for  them  to  closely 
control  attrition  nor  to  ask  a  wide  range 
of  questions.  There  have  been  no 
substantial  longitudinal  studies  of  the 
unemployed  or  the  wider  community. 

March  1983  saw  a  change  in 
Government  in  Australia,  with  the  new 
Labor  Government  being  committed  to  large 
scale  job-creation  schemes.  In  developing 
these  schemes,  however,  the  policy  makers 
had  requirements  for  more  information 
than  was  available  about  the  unemployed. 
This  led  to  confirmation  of  interest  in 
longitudinal  data  and  the  initiation  of  a 
major  longitudinal  survey  directed  at 
data  on  policy  issues. 

The  first  wave  of  the  survey  (which 
encompasses  only  one  quarter  of  the  total 
sample)  will  go  into  the  field  in 
September  1984,  with  the  introduction  of 
the  remainder  of  the  sample  being  planned 
for  1985.  The  current  plans  encompass 
surveys  in  1984,  1985,  1986  and  1987. 
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3.  Broad  Design  Issues 
Given  the  origins  of  this  survey,  the 
initial  objective  was  to  consider  the 
following  questions: 

who  are  the  long-term  unemployed 
(not  only  in  demographic  terms  but  in 
terms  of  many  other  aspects  of  their 
background)? 

how  do  they  become  long-term 
unemployed  (or  conversely,  which  of  those 
who  become  unemployed  get  jobs  most 
quickly)? 

how  do  people  change  with  increasing 
durations  of  unemployment  -  with  respect 
to  job  search  methods,  physical  and 
mental  health,  finances  etc.? 

There  are  a  number  of  ways  in  which 
these  objectives  can  be  met,  and  a  number 
of  different  international  'models'  were 
considered.  Two  key  issues  were  involved 
in  these  considerations  -  firstly  the 
general  form  of  the  survey,  and  secondly 
(given  the  general  form  chosen)  the 
choice  of  the  cohort  to  be  surveyed. 
These  are  addressed  in  turn  below. 

The  first  approach  considered  was  to 
select  a  sample  from  the  current  stock  of 
unemployed,  and  follow  their  experiences 
through  time  (see  for  example  White 
1983).  However  this  approach  has  three 
deficiencies.  First,  it  does  not  allow  us 
to  examine  the  process  of  becoming  long- 
term  unemployed,  although  this 
information  is  vital  in  determining 
policies  to  influence  this  process. 

Second,  this  approach  does  not 
provide  a  control  group.  While  population 
information  from  the  Census  and  Labour 
Force  surveys  provides  suitable 
comparative  information  for  some  issues, 
for  others  these  sources  provide 
inadequate  detail.  Third,  after  a  number 
of  years  the  sample  will  no  longer 
consist  solely  of  unemployed  people.  The 
value  of  this  eventual  sample  structure 
is  far  from  clear.  A  survey  was  therefore 
required  which  provided  adequate  samples 
of  long  and  short-term  unemployed, 
employed  and  not  in  labour  force. 

There  are  a  number  of  major  surveys 
which  have  this  coverage.  The  most 
prominent  of  these  are  the  US  National 
Longitudinal  Surveys  (NLS)  (see,  Borus 
1981)  and  the  Panel  Study  of  Income 
Dynamics  (PSID)  (see,  Morgan  1974).  The 
former  uses  national  samples  of 
individuals  within  particular  age/gender 
cohorts.  The  latter  is  based  on  a  sample 
of  families,  and  includes  people  of  all 
ages  and  both  genders. 

The  principal  objectives  of  this 
survey  relate  to  unemployment  and  other 
issues  of  relevance  to  the  labour  market, 
and  hence  to  the  behaviour  and  experience 
of  individuals.  It  was  therefore 
considered  that  an  approach  which  was 
based  on  individuals  was  more 
appropriate.  This  does  not  mean  that 
information  on  families  will  be 
neglected,  and  in  fact  a  large 
amount  of  data  about  other  family  members 


is  being  collected.  It  does  mean  however 
that  the  individual  is  the  sampling  unit, 
and  the  unit  which  is  to  be  tracked. 

The  next  issue  in  the  broad  design 
was  to  consider  whether  the  survey  should 
cover  all  age/gender  groups,  or  be 
constrained  to  a  single  cohort.  In 
Australia  about  half  the  unemployed  and 
about  half  the  long-term  unemployed  are 
young  (under  25).  Any  attempt  to  sample 
uniformly  across  all  ages  would  therefore 
mean  half  of  the  sample  would  be  spread 
across  a  forty  year  age  range,  allowing 
little  scope  for  detailed  analysis  of  any 
particular  subgroup. 

After  considerable  debate  on  both 
technical  and  policy  grounds  it  was 
decided  that  the  survey  should 
concentrate  on  a  single  youth  cohort, 
defined  by  the  age  range  15-24  years.  As 
noted  earlier  this  group  comprises  about 
half  the  unemployed,  and  until  recently 
it  accounted  for  nearly  90  per  cent  of 
government  expenditure  on  training  and 
employment  programs  for  the  unemployed. 
Examination  of  this  group  has  the  added 
advantage  that  it  allows  examination  of 
the  process  of  transition  from  school  to 
work,  and  covers  the  age  range  within 
which  a  majority  of  Australians  marry  and 
many  have  their  first  child. 

The  sample  will  cover  both  people 
who  are  long-term  unemployed  and  those 
who  are  not.  This  can  most  easily  be  done 
by  selecting  an  area  sample  of  the 
defined  age  group.  Only  about  10  per  cent 
of  such  a  sample  would  be  unemployed, 
however,  and  about  half  of  these  long- 
term.  This  is  inadequate  for  analytic 
purposes,  so  there  was  a  need  to  over- 
represent  the  unemployed.  This  could  be 
done  by  oversampling  areas  likely  to 
contain  unemployed  persons,  as  is  done  in 
the  US  NLS  (CHRP.  1982).  This  would 
increase  the  numbers  of  unemployed  in  the 
sample  not  only  in  the  first  wave  but 
also  in  later  waves,  assuming  areas 
retain  their  employment  characteristics. 

An  alternative  was  to  select  a  list 
sample  of  known  unemployed  in  the  first 
survey.  Although  this  may  not  retain  such 
high  unemployment  levels  in  the  sample  in 
later  years  as  the  first  approach,  the 
number  of  unemployed  in  the  first  wave 
would  be  controlled,  and  the  number  of 
people  moving  into  long-term  unemployment 
in  the  second  wave  is  maximised.  There 
are  two  extensive  lists  of  unemployed 
people  in  Australia,  that  held  by  the 
Commonwealth  Employment  Service  (CES)  and 
that  held  by  the  Department  of  Social 
Security  (DSS).  For  reasons  of  both 
convenience  of  access  and  greater 
comprehensiveness,  the  CES  list  is  to  be 
used.  The  use  of  this  list  also  provides 
a  sample  which  represents  the  registered 
unemployed  and  hence  provides  an 
opportunity  for  analysis  of  the  activity 
of  the  employment  service.  Finally,  the 
use  of  a  relatively  large  list  sample 
provides  an  opportunity  to  experiment 
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with  'tracking  down'  individuals.  As 
there  have  been  very  few  substantial 
longitudinal  surveys  in  Australia,  there 
is  very  limited  experience  in  this 
process. 

The  outcome  of  these  considerations 
was  a  sample  of: 

3  000  persons  aged  15-24  who  had 
been  registered  with  the  Commonwealth 
Employment  Service  (CES)  for  at  least 
3  months.  (Group  1) 

9  000  persons  selected  from  an  area 
sample  which  covers  all  but  very 
sparsely  settled  areas  of  Australia. 
(Group  2) 

(Sample  sizes  are  discussed  in 
Section  7.) 

The  survey  is  to  be  implemented  as 
follows: 

Group  1      Group  2 
First  Wave 


September  1984 
June-Aug  1985 
September  1985 
June-Aug  1986 
September  1986 
June-Aug  1987 
September  1987 


First  Wave 


Second  Wave 


Second  Wave 


Third  Wave 


Third  Wave 


Fourth  Wave 
The  separation  of  the  first  waves  of 
the  two  parts  of  the  survey  was  made  for 
various  reasons,  which  were  all 
essentially  practical 

relatively  little  time  and  staff 

resources  were  available  for  survey 

development 

the  survey  was  to  be  conducted  by  a 

commercial  agency  (the  reasons  for 

this  are  discussed  below)  and  even  the 

small  survey  was  a  large  job  by  their 

standards 

it  was  considered  necessary  to  test 

the  quality  of  this  work,  particularly 

with  respect  to  tracking,  before 

launching  the  major  part  of  the 

survey 

as  the  principal  focus  of  the  survey 

was  to  be  unemployment  a  relatively 

large  and  detailed  study  of  the 

unemployed  was  of  considerable  value 

in  its  own  right. 

there  was  considerable  policy 

interest  in  obtaining  data  as  early  as 

possible,  so  it  was  desirable  to  begin 

the  survey  in  1984. 

4 .  Survey  Objectives 

As  noted  in  the  previous  section,  the 
prime  focus  of  analysis  from  this  survey 
will  be  long-term  unemployment.  The 
processes  of  entry  and  exit  to  this  state 
and  the  changes  observable  in  people  as 
unemployment  duration  lengthens  will  be 
examined.  However  there  are  many  other 
potential  uses  for  the  data.  The 
organisations  with  whom  there  has  been 
substantial  contact  include  the 
Department  of  Education  and  Youth 
Affairs,  whose  main  interest  is  in  those 
who  leave  education  'early'.  The 
Commonwealth  Employment  Service,  is  also 
interested  in  using  this  survey  to 
examine  the  use  of,  and  attitudes  to,  its 


services. 

The  survey  will  allow  a  great  deal 
of  analysis  of  labour  market,  educational 
and  social  welfare  issues.  These 
include: 

analysis  of  flows  between  employed/ 
unemployed/not  in  labour  force  states; 
to  answer  the  question  of  whether 
there  are  many  people  unemployed  for  a 
short  time  or  a  few  people  unemployed 
for  a  long  time 
examination  of  the  process  of 
transition  from  school  to  work 
examination  of  patterns  of  usage  of 
the  Commonwealth  Employment  Service 
study  of  usage  of  government 
employment  and  training  programs  and 
their  long-term  outcomes 
studies  of  links  between  poverty  and 
unemployment  -  who  are  the  poor?  do 
they  stay  poor? 

Given  the  very  high  cost  of  the 
survey,  which  could  be  as  high  as  one  and 
a  half  million  dollars  per  year  in  its 
peak  years,  its  potential  needs  to  be 
exploited  as  far  as  possible.  Thus  while 
the  survey  cannot  be  thought  of  as  an 
omnibus  to  be  used  freely,  and  data  will 
only  be  included  which  can  benefit  from 
the  longitudinal  nature  of  the  data  base, 
it  is  intended  to  provide  access  to  it  as 
a  research  vehicle  for  both  government 
and  non-government  research. 

5 .  Government  Program  Evaluation 
A  particular  objective  of  the  survey  is 
to  provide  information  for  evaluation  of 
government  employment  and  training 
programs.  There  are  two  aspects  to  this 
use.  For  most  programs,  participants  will 
be  asked  a  very  limited  set  of  questions 
about  their  participation  in  programs  and 
the  benefits  they  perceive  from  this 
participation.  The  largest  job-creation 
program  in  Australia,  the  Community 
Employment  Program  (or  CEP),  will  be 
treated  differently.  The  evaluation  of 
this  program  was  initially  envisaged  as 
drawing  on  a  separate  longitudinal 
survey,  as  a  principal  evaluation  issue 
was  whether  participation  enhanced  long- 
term  employment  outcomes.  However,  there 
is  substantial  commonality  in  data 
requirements,  and  the  timing  proposed  for 
the  evaluation  and  the  longitudinal 
survey  (ANLS)  were  similar.  In  the 
interests  of  efficiency  (particularly 
with  respect  to  survey  development  and 
management)  the  two  surveys  have  been 
merged.  As  far  as  interviewers  are 
concerned,  there  is  one  survey,  with 
different  question  streams  for  CEP 
participants,  and  the  rest.  As  far  as 
analysts  are  concerned  there  are  two 
surveys  -  the  ANLS  and  the  CEP 
evaluation.  The  ANLS  will  serve  as  a 
control  group  for  the  evaluators  in  some 
contexts.  The  full  structure  of  the  first 
wave  of  the  survey  is  shown  in  Table  1 
attached. 

The  discussion  which  follows  relates 
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to  the  ANLS  only,  but  the  actual  survey 
development  and  management  have  been 
substantially  influenced  by  the  need  to 
provide  an  appropriate  framework  for  CEP 
evaluation. 

6 .  Data  Requirements 

The  data  requirements  for  this  survey  are 
broad.  The  initial  auspices  of  the  survey 
was  as  a  basis  for  policy  oriented  labour 
market  research,  but  there  is 
considerable  interest  from  many  sections 
of  government  and  academia.  The  data  are 
constrained  by  a  requirement  that  the 
interview  be  restricted  to  about  one 
hour.  This  restriction  is  somewhat 
arbitrary,  but  was  based  on  experience  of 
surveys  in  this  country.  This  suggested 
that  interviews  which  run  longer  than  an 
hour  are  more  likely  to  experience  severe 
'respondent  fatigue',  which  may  lead  to 
eventual  refusal  to  complete  the 
interview. 

The  central  data  requirements  of  the 
survey  were  defined  by  the  objectives 
outlined  in  Sections  3  and  4.  These  were 
refined  and  added  to  in  an  exhaustive 
round  of  consultations  with  potential 
users  of  the  survey  data  in  many  arms  of 
government  and  in  the  wider  research 
community.  The  list  of  data  items  in 
Table  2  attached  reflects  these 
consultations. 

Thus  the  basic  demographic  data  for 
the  respondent  and  their  family  will  be 
collected,  plus  detailed  educational 
information  including  an  attempt  to 
assess  "on-the-job"  training.  Detailed 
employment  histories  are  to  be  collected 
for  the  period  covered  by  the  survey,  and 
the  12  months  preceding  the  first 
interview.  Broad  data  on  the  three  years 
preceding  this  period  will  also  be 
collected.  Current  labour  force  data  will 
be  collected  consistent  with  the  concepts 
definitions  and  patterns  used  by  the 
Australian  Bureau  of  Statistics  in  their 
Special  Supplementary  Surveys,  plus  data 
from  the  unemployed  on  the  methods  they 
use  to  seek  work  and  the  wage  they  seek. 
Limited  data  on  health,  both  physical  and 
mental,  will  be  sought,  plus  quite 
detailed  income  data,  and  somewhat 
limited  assets  data.  Several  questions 
will  also  be  asked  about  satisfaction 
with  working  conditions. 

Participants  in  Government  employment 
and  training  programs  will  be  asked  about 
these  programs,  with  a  special  more 
detailed  set  of  questions  for  CEP 
participants.  The  unemployed  will  be 
asked  about  their  usage  of,  and 
satisfaction  with,  the  Commonwealth 
Employment  Service. 

It  is  proposed  to  limit  the  questions 
on  attitudes,  aspirations,  and 
expectations.  It  is  proposed  to  ask  about 
attitudes  to  women  working  and  about 
educational  plans,  though  it  is  not 
proposed  to  ask  about  occupational 
expectations . 


7 .  Sample  Size 

Sample  size  considerations  invariably 
involve  compromise  between  'needs'  and 
cost/management  considerations.  The 
'need'  in  this  case  is  to  be  able  to 
examine  small  sub-groups.  For  example  of 
3  000  sample  about: 

1  400  will  be  unemployed  9 
months  or  more 
500  of  these  will  be  female 
250  of  these  will  be  aged  15-19 
125  will  be  in  their  first  year  out 
of  school 

Examining  this  particular  group  of 
young  females  with  respect  to  their 
educational  levels,  whether  they  live 
with  parents,  income,  employment 
experience,  country  of  origin  etc  is 
difficult  because  of  the  sample  sizes. 
This  is,  however,  clearly  a  group  with 
quite  specific  needs  and  will  be 
studied . 

With  the  9  000  area  sample  there 
will  be  reasonably  large  numbers  still  at 
school  initially,  who  can  be  'tracked'  as 
they  enter  the  workforce.  However, 
between  year  one  and  year  two  of  the  area 
sample  it  is  expected  that  about 
80  will  leave  school  aged  15 
220  will  leave  school  aged  16 
250  will  leave  school  aged  17 
The  15  and  16  years  olds  will  not 
have  completed  secondary  school,  although 
some  of  the  17  year  olds  will  have 
completed.  The  analysis  of  the  eventual 
labour  market  outcomes  of  some  of  these 
groups  by  gender  and  birthplace  will  be 
difficult  because  of  the  very  small 
sample  sizes. 

Thus,  sample  sizes  were  not 
established  in  a  highly  scientific  way, 
but  were  rather  seen  as  a  trade-off 
between  a  survey  size  which  could  be 
managed  on  one  hand,  and  'adequate' 
samples  of  important  sub-groups  on  the 
other.  While  expected  standard  errors  of 
important  variables  were  calculated, 
considerations  such  as  those  shown  above 
played  the  major  role  in  sample  size 
considerations.  The  examples  indicate 
that  the  final  sizes  chosen  are  not 
excessive . 

8.  Sample  Design 

The  sample  design  for  this  survey  is 
constrained  by  the  usage  of  the 
Commonwealth  Employment  Service  (CES) 
register  as  the  framework  for  the  first 
stage.  This  register  is  not  a  unitary 
file,  but  rather  a  series  of  files  held 
in  each  of  250  CES  offices.  The  only 
practical  sample  design  is  therefore  a 
multi-stage  design,  selecting  first  the 
CES  offices,  and  second  the  persons 
registered  at  those  offices. 

The  sample  is  to  be  allocated 
proportionately  to  the  six  States  and  two 
Territories,  as  the  main  requirement  is 
for  national  rather  than  state  level 
data.  Within  States  the  sample  was 
stratified  into  metropolitan/non- 
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metropolitan  areas,  with  further 
stratification  of  the  large  metropolitan 
areas.  The  smaller  metropolitan  areas  and 
non-metropolitan  areas  were  carefully 
ordered,  and  not  divided  into  strata. 
These  steps  led  to  20  strata. 

The  sample  will  be  selected  in  two 
and,  when  relevant,  three  stages.  Firstly 
CES  offices  will  be  chosen  and  then 
people  registered  at  the  offices.  If  this 
group  is  too  geographically  dispersed  (as 
can  happen  in  country  areas  in 
particular),  a  sub-selection  of  a 
geographical  cluster  will  be  made.  The 
sample  will  consist  of  60  clusters  each 
of  50  persons. 

The  area  sample  will  be  based  on  the 
same  geographic  regions,  both  for  reasons 
of  minimising  interviewer  travel  and  to 
provide  a  suitable  control  group  for  the 
sample  of  registrants.  While  the  level  of 
clustering  is  yet  to  be  finalised,  and  it 
may  prove  practical  to  spread  the  sample 
further,  the  current  proposal  is  to 
select  10  blocks  with  an  average  of  1 5 
respondents  from  each  area  controlled  by 
a  selected  CES  office. 

The  selection  of  the  area  sample 
will  involve  a  screening  phase,  but  as  it 
is  expected  that,  on  average,  one  15-24 
year  old  person  will  be  found  in  every 
second  house  this  will  not  be  excessive. 
All  eligible  persons  found  in  the 
screening  are  to  be  interviewed. 

As  a  consequence  there  will  be 
households  where  more  than  one  person  is 
included  in  the  sample.  This  approach  was 
chosen  as  it  will  give  samples  of  both 
husband  and  wife  and  also  of  siblings.  It 
is  also  cheaper,  but  will  have  a  higher 
variance  for  some  estimates,  and  response 
problems  may  arise  if  too  much  time  is 
requested. of  the  household.  This  last 
problem  has  not  yet  been  tested  in  the 
field,  and  is  the  one  aspect  of  this  plan 
which  may  require  reconsideration. 

The  sample,  once  selected,  is  to  be 
followed  over  a  number  of  years.  There  is 
very  little  Australian  experience  of 
tracking  in  household  surveys  and  it  is 
not  clear  what  levels  of  sample  attrition 
are  likely.  American  experience  of  sample 
attrition  is  very  varied.  The  NLS  older 
cohorts  have  retained  between  68  per  cent 
and  78  per  cent  of  their  sample  after  10 
years  (see  O'Neill,  1983).  The  new  youth 
cohort  has  maintained  96  per  cent  of  the 
original  sample  for  each  year  of  its 
conduct  so  far  (Rhoton,  1984).  The 
University  of  Michigan  Panel  Study  of 
Income  Dynamics  lost  14  per  cent  of  the 
original  sample  between  the  first  and 
second  waves,  but  suffered  very  small 
losses  thereafter  (see  Survey  Research 
Centre,  1983).  In  the  survey  of  the 
unemployed  conducted  in  the  UK,  the 
retention  from  1980  to  1981  was  78  per 
cent  (White  1983). 

Australia  has  quite  high  levels  of 
mobility  among  the  young,  particularly 
the  unemployed,  but  does  not  have  the 


same  range  of  public  registers  as 
countries  like  the  USA.  The  main  tracking 
Tiechanism  will  be  the  names  and  addresses 
of  family  and  friends  supplied  by  the 
person  at  the  initial  interview.  Beyond 
this  there  are  phone  books  and  electoral 
rolls,  neither  of  which  are  likely  to  be 
up  to  date  -  particularly  with  respect  to 
the  group  of  interest.  There  are  some 
possibilities  of  using  other  government 
registers,  but  these  vary  from  area  to 
area  and  have  not  yet  been  fully 
explored.  Every  effort  will  be  made  to 
retain  contact  with  respondents  in  the 
period  between  surveys  but  until  the 
process  is  actually  begun  it  is  not  known 
how  successful  this  will  be. 

It  is  intended  to  make  every  effort 
to  retain  contact  with  respondents  in  the 
period  between  surveys  but  until  the 
process  is  actually  begun  it  is  not  known 
how  successful  this  will  be. 

It  is  intended  to  make  every  effort 
to  minimise  attrition.  People  anywhere 
in  Australia  will  be  included  in  the 
sample,  although  in  more  remote  areas 
this  may  be  by  telephone.  People  who 
leave  Australia  will  be  omitted  from  the 
sample.  Each  year  efforts  will  be  made  to 
contact  the  full  original  sample,  so 
respondents  may  miss  out  one  year  and  be 
re-included  the  following  year. 

As  noted  above,  no  other  Australian 
study  has  had  the  resources  to  follow  up 
respondents  in  the  way  which  is  proposed 
for  this  survey.  What  the  final  attrition 
rates  will  be  is  strictly  a  matter  for 
conjecture  at  this  stage. 

9 .  Estimation  Procedures 

Estimation  procedures  will  be  considered 
further  when  the  pilot  tests  have  been 
conducted.  The  notes  below  indicate 
current  thinking. 

Basically,  estimation  will  always 
relate  to  the  population  at  June  1984. 
The  simplest  version  will  be  for  the  1984 
survey,  which  will  be  a  list  sample  only. 
Within  each  stratum  the  numbers  of 
persons  registered  by  age,  sex  and 
duration  of  registration  will  be 
available.  Selection  probabilities  (say 
Phi)  for  each  person  (i)  selected  in 
stratum  h  will  be  known.  Response  rates 
within  certain  groups  will  also  be 
known . 

The  most  important  groupings 
expected  to  show  differential  non 
response  are  those  who  do  and  those  who 
do  not  move  house  between  selection  and 
interview.  Information  on  marital  status 
and  country  of  origin  will  also  be 
available.  In  the  discussion  which 
follows  it  is  assumed  that  moving  is  the 
main  cause  of  non-response.  If  this  is 
not  the  case  the  same  methods  would 
apply. 

Assuming  that  all  sampled  persons 
can  be  identified  as  'movers'  or  'non- 
movers',  the  sample  will  be  post- 
stratified  into  these  categories. 
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Separate  estimation  and  non-response 
adjustment  would  then  be  undertaken  for 
each  post-stratum.  Information  on  age, 
sex,  and  duration  of  unemployment  will  be 
available  for  the  complete  population.  As 
both  response  rates  and  attributes  are 
likely  to  be  related  to  these  factors, 
this  data  may  be  used  for  benchmarks. 

The  current  plan,  which  will  be 
finalised  using  the  results  of  the  pilot 
test,  is  that  within  each  selection 
stratum  (a)  separate  estimates  (using 
conventional  unequal  probability 
estimators)  would  be  made  for  'movers' 
and  'non-movers',  (b)  These  estimates 
would  then  be  adjusted  for  non- 
response,  (c)  Further  adjustments  would 
then  be  made  to  known  age-sex-duration  of 
unemployment  benchmarks. 

When  the  area  sample  is  introduced 
there  will  be  several  variations  to  this 
approach  needed.  Armstrong  (1978) 
provides  a  very  thorough  discussion  of 
methods  for  estimating  from  multi-frame 
surveys.  The  following  discussion  briefly 
describes  the  methods  to  be  applied  in 
this  case.  Firstly,  probabilities  of 
selection  will  change.  In  particular,  all 
people  who  were  eligible  for  selection  in 
the  list  sample  will  have  two 
opportunities  to  be  selected  -  in  the 
list  sample  and  in  the  area  sample.  Thus 
there  will  be  three  groups  of  people  in 
the  final  sample 
i)  Those  selected  in  the  list 

sample . 
ii)  Those  selected  in  the  area  sample 
but  eligible  for  the  list  sample, 
iii)  Those  selected  in  the  area  sample 
and  not  eligible  for  the  list 
sample . 

While  the  probabilities  for  group 
iii)  may  be  calculated  directly,  those 
for  groups  i)  and  ii)  are  more  complex. 
In  both  these  cases  the  total  probability 
of  selection,  (where  h  denotes  selection 
stratum)  is: 

Prob  (selected)  =  Prob  (in  list 

sample)  +  Prob  (in  area  sample)  -  Prob 

(in  both  samples) 

=  Ph  +P  A  -  Ph** 

Given  that  the  area  sample  in  any 
region  will  be  three  times  as  big  as  the 
list  sample,  and  denoting  by  U  the 
unemployment/  population  ratio  in  the 
area,  and  f  the  proportion  of  unemployed 
people  in  that  region  included  in  the 
list  sample,  it  can  be  shown  that: 

Prob  (person  i  is  selected)  =  (1  + 

3u(l-f))Ph 

Thus  at  one  extreme,  with  an 
unemployment/population  ratio  of  15  per 
cent  and  f  of  .025 

P(i  selected  in  R)  =1.45  Ph 

More  likely  numbers  are  10  per  cent 
and  .06,  giving 

P(i  selected  in  R)  =1.28  P^ 

In  some  rural  areas  f  will  be  very 
large,  perhaps  as  large  as  0.5.  In  this 
case,  if  u  =  0.1 

P(i  selected  in  R)    =  1.15  P. 


Thus  the  joint  probabilities  can  be 
quite  variable  depending  on  the 
situation. 

The  actual  estimation  procedure  for 
the  area  sample  will  obviously  be 
different  to  the  list  sample  in  the  first 
year,  as  the  notion  of  'movers'  will  not 
apply.  Straight  forward  age-sex 
adjustment  at  the  smallest  geographic 
level  for  which  benchmark  data  is 
available  will  be  used.  This  will 
probably  be  at  the  state  level,  possibly 
classified  into  metropolitan  and  non- 
metropolitan.  In  later  years,  separate 
non-response  adjustment  for  movers  and 
non-movers  would  also  be  applied. 

As  noted  earlier,  this  approach 
indicates  current  thinking.  If  further 
investigation  reveals  response  rates  are 
much  the  same  for  movers  and  non-movers 
the  process  could  be  simplified.  The 
number  of  age-sex-duration  cells  is  yet 
to  be  considered,  but  may  be  very  few. 
Adjustments  which  are  proposed  for 
stratum  level  could  be  undertaken  at 
state  or  even  national  levels  to  reduce 
complexity.  These  issues  have  not  yet 
been  thoroughly  addressed. 

Given  the  clustering  of  the  sample 
and  the  complexity  of  the  estimation 
process,  variance  estimation  will  clearly 
need  to  be  of  a  replicated  nature. 
Exactly  how  it  will  be  done  is  yet  to  be 
decided,  although  balanced  repeated 
replication  techniques  (see  McCarthy, 
1966  or  Kish  &  Frankel,  1970)  are 
suitable. 

10 .  Fieldwork  and  Data  Processing 
Data  collection  and  processing  of  the 
ANLS  will  be  undertaken  by  a  commercial 
market  research  company.  As  yet  there  are 
no  large  scale  agencies  in  Australia  like 
the  social  survey  research  organisations 
in  America.  The  larger  Australian  market 
research  companies  do  however  undertake 
social  research,  and  have  substantial 
experience  of  such  work. 

Efforts  are  being  made  to  involve 
the  Federal  Government's  official 
statistical  agency,  the  Australian  Bureau 
of  Statistics  (ABS)  in  later  stages  of 
the  survey.  The  ABS  is  the  largest  and 
most  competent  survey  organisation  in  the 
country,  and  while  it  has  not  been  able 
to  participate  so  far,  may  be  in  a 
position  to  provide  field  work  and  other 
assistance  in  the  future. 

The  survey  is  planned  as  purely 
personal  interview.  There  may  be  some  use 
of  the  telephone  for  people  who  have 
moved  to  particularly  remote  places,  but 
this  would  be  quite  restricted .  There  is 
also  some  possibility  of  switching  to 
telephone  interviewing  in  later  years, 
but  the  survey  until  1987  has  been 
planned  and  costed  on  the  basis  of 
personal  interview. 

The  consultant  will  be  contracted  to 
provide  BLMR  with  a  clean  data  tape,  and 
data  processing  will  be  undertaken  at 
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BLMR.  It  is  proposed  to  manage  the  data 
files  using  the  SIR  data  management 
system,  and  to  undertake  most  of  the 
analysis  using  SAS.  It  is  hoped  that  even 
variance  estimation  will  be  possible  in 
SAS,  as  programs  for  Balanced  Repeated 
Replication  have  already  been  published. 

11 .  Analysis  Proposed 

In  Sections  3  and  4  a  series  of  issues 
are  noted  which  BLMR  and  other  agencies 
will  wish  to  address  in  a  variety  of 
ways.  Table  3  attached  lists  a  variety  of 
other  issues  for  research  which  will  be 
given  priority.  As  it  is  planned  to  make 
the  data  base  publicly  available  as  early 
as  possible  it  is  likely  much  analysis 
will  be  progressively  undertaken  outside 
BLMR. 

The  initial  publications  will  be 
aimed  at  providing  a  description  of  the 
population  being  sampled,  and  those 
aspects  of  their  history  to  be  collected 
by  recall.  As  longitudinal  data  becomes 
available  even  relatively  simple 
descriptive  statistics  will  contain  a 
great  deal  of  information.  In  the  initial 
analysis  it  is  not  proposed  to  use  higly 
sophisticated  techniques.  These 
techniques  will  no  doubt  be  considered  at 
a  later  stages  and  in  the  contexts  of 
particular  research  projects. 

12 .  Conclusion 

The  Australian  National  Longitudinal 
Survey  is  an  ambitious  and  expensive 
undertaking.  It  will  provide  a  wealth  of 
data  which  has  not  been  available 
previously  in  Australia,  both 
longitudinal  and  cross-sectional,  and  has 
the  capacity  to  make  a  substantial 
contribution  to  both  policy  making  and 
research  in  this  country. 
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Table  1.   Survey  Structure  1984 


Sample  Component 


15-24  years 


25+  years 


CEP  -  Participants 


CEP  Control 


Random  sample  of  those 

registered  3  months  with 

CBS  (3  000) 

Random  sample  of  CEP 

participants  -  constrained 

by  geographic  clustering 

of  ANLS  (500) 


Matched  sample  of 

registrants  who  are 

eligible  for  CEP 

positions  (500) 


Random  sample  of  CEP 

participants  - 

constrained  by 

geographic  clustering 

of  ANLS  (500) 

Matched  sample  of 

registrants  who  are 

eliqible  for  CEP 

positions  (500) 


Total  sample  size  for  1984  -  5  000  persons 


Table  2.  Data  Items:  Broad  Headings 

Demographic  variables  relating  to  the 
respondent  and  other  residents  of  the 
respondent's  household 

Family  Background 

Marital  History  and  Fertility 

Education 

Impediments  to  Employment 

Current  Labour  Force  Status  -  Job 

search  methods  and  reservation  wage 

Job  History  -  last  12  months 

Work  Experience  Prior  to  '12  months' 

before 

Health  -  Physical  and 

Psychological . 

Participation  in  and  Satisfaction 

with  Government  Programs 

Income  and  Assets 

Satisfaction  with  Working 

Conditions 


Table  3  Selected  Issues  for  Research 
To  determine  the  concomitants  of 
the  lengthening  of  unemployment 
spells.  Are  all  socioeconomic,  age, 
and  gender  groups  equally  at  risk? 
To  identify  the  labour  market 
consequences  of  long-term 
unemployment.  Are  youth  permanently 
disadvantaged  after  long-term 
unemployment  or  does  this  experience 
have  a  positive  impact  on  later  labour 
market  outcomes? 

To  determine  the  changes  occurring  in 
an  individual's  attitudes  and 
behaviour  over  the  duration  of  a  spell 
of  unemployment.  Issues  considered 
include  finances,  housing,  family 
formation,  educational  expectations, 
retraining  and  occupational  and 
geographic  mobility. 


To  investigate  the  differences  in 
incidence  and  impact  of  long-term 
unemployment  and  frequent  short  spells 
of  unemployment.  Information  is  needed 
on  the  social  and  demographic 
differences  in  those  experiencing  the 
two  types  of  unemployment,  on  the 
effects  of  both  types  on  later  labour 
market  success,  and  on  the  effect  of 
the  business  cycle  on  these  patterns 
of  unemployment. 

To  examine  whether  the  incidence  of 
long-term  unemployment  is  different 
for  males  than  it  is  for  females.  To 
examine  attributes  such  as  education, 
work  experience  and  geographic 
mobility  in  relation  to  male  and 
female  labour  force  experiences. 
To  investigate  different  school  to 
work  transition  patterns,  the  relative 
importance  of  factors  involved  in 
successful  transition. 
To  examine  the  role  of  vocational 
training  in  labour  market  experience, 
especially  as  regards  trades  careers. 
To  gain  insights  into  the  ways  in 
which  permanent  occupations  are  found. 
There  are  conflicting  hypotheses 
about  the  relatively  frequent  job 
changes  by  youth.  Evidence  is  needed 
to  assess  the  causes  and  effects  of 
these  job  changes. 

To  assess  the  impact  of  different  job 
search  methods  on  the  duration  of  job 
search,  job  acceptance,  and  subsequent 
earning  levels  and  job  satisfaction. 
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